[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16736
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16850
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16850
  
**[Test build #3563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3563/testReport)**
 for PR 16850 at commit 
[`5025cb7`](https://github.com/apache/spark/commit/5025cb7511a43e24cb3a181eb7b06c69b024479f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100104738
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -298,22 +312,22 @@ class JacksonParser(
 // Here, we pass empty `PartialFunction` so that this case can be
 // handled as a failed conversion. It will throw an exception as
 // long as the value is not null.
-parseJsonToken(parser, dataType)(PartialFunction.empty[JsonToken, 
Any])
+parseJsonToken[AnyRef](parser, 
dataType)(PartialFunction.empty[JsonToken, AnyRef])
   }
 
   /**
* This method skips `FIELD_NAME`s at the beginning, and handles nulls 
ahead before trying
* to parse the JSON token using given function `f`. If the `f` failed 
to parse and convert the
* token, call `failedConversion` to handle the token.
*/
-  private def parseJsonToken(
+  private def parseJsonToken[R >: Null](
--- End diff --

It states that `R` must be a nullable type. This enables `null: R` to 
compile and is preferable to the runtime cast `null.asInstanceOf[R]` because it 
is verified at compile time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100103739
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -227,66 +267,71 @@ class JacksonParser(
   }
 
 case TimestampType =>
-  (parser: JsonParser) => parseJsonToken(parser, dataType) {
+  (parser: JsonParser) => parseJsonToken[java.lang.Long](parser, 
dataType) {
 case VALUE_STRING =>
+  val stringValue = parser.getText
   // This one will lose microseconds parts.
   // See https://issues.apache.org/jira/browse/SPARK-10681.
-  Try(options.timestampFormat.parse(parser.getText).getTime * 
1000L)
-.getOrElse {
-  // If it fails to parse, then tries the way used in 2.0 and 
1.x for backwards
-  // compatibility.
-  DateTimeUtils.stringToTime(parser.getText).getTime * 1000L
-}
+  Long.box {
--- End diff --

This is needed to satisfy the type checker. The other approach is to 
explicitly specify the type in two locations: 
`Try[java.lang.Long](...).getOrElse[java.lang.Long](...)`. I found explicitly 
boxing to be more readable than the alternative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16852: [SPARK-19512][SQL] codegen for compare structs fails

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16852
  
**[Test build #3565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3565/testReport)**
 for PR 16852 at commit 
[`9a8d853`](https://github.com/apache/spark/commit/9a8d8537748f38a4276188b3f60f6852010e3387).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16852: [SPARK-19512][SQL] codegen for compare structs fails

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16852
  
**[Test build #3565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3565/testReport)**
 for PR 16852 at commit 
[`9a8d853`](https://github.com/apache/spark/commit/9a8d8537748f38a4276188b3f60f6852010e3387).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100101464
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -160,7 +164,17 @@ public void writeTo(OutputStream out) throws 
IOException {
 throw new ArrayIndexOutOfBoundsException();
   }
 
-  out.write(bytes, (int) arrayOffset, numBytes);
+  return ByteBuffer.wrap(bytes, (int) arrayOffset, numBytes);
+} else {
+  return null;
--- End diff --

It will allocate an extra object but would simplify the calling code... 
since it would be a short lived allocation it's probably fine to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100100641
  
--- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -194,5 +195,8 @@ class PortableDataStream(
   }
 
   def getPath(): String = path
+
+  @Since("2.2.0")
--- End diff --

This is a public class so I thought adding a `since` tag would benefit the 
documentation. If it's not desired I can certainly remove it.

As for making the lazy val public vs private: I'm following the style used 
already in the class. There are public get methods for each private field. I'm 
not partial to either approach but prefer to keep it consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100099791
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -31,10 +31,17 @@ import 
org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, CompressionCodecs
  * Most of these map directly to Jackson's internal options, specified in 
[[JsonParser.Feature]].
  */
 private[sql] class JSONOptions(
-@transient private val parameters: CaseInsensitiveMap)
+@transient private val parameters: CaseInsensitiveMap,
+defaultColumnNameOfCorruptRecord: String)
--- End diff --

Previously the `JSONOptions` instance was always passed around with a 
`columnNameOfCorruptRecord` value. This just makes it a field in `JSONOptions` 
instead to put all options in one place. Since it's a required option it made 
more sense to use a field instead making an entry in the `CaseInsensitiveMap`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11760: [SPARK-13931] Resolve stage hanging up problem in...

2017-02-08 Thread GavinGavinNo1
Github user GavinGavinNo1 closed the pull request at:

https://github.com/apache/spark/pull/11760


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11760: [SPARK-13931] Resolve stage hanging up problem in a part...

2017-02-08 Thread GavinGavinNo1
Github user GavinGavinNo1 commented on the issue:

https://github.com/apache/spark/pull/11760
  
@kayousterhout I got some problem with git conflict. So I create a new 
branch and a new pull request. You may refer to 
https://github.com/apache/spark/pull/16855. And I close this pull request for 
the time being. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16855: [SPARK-13931] Resolve stage hanging up problem in a part...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16855
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100098008
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1764,4 +1769,125 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 val df2 = spark.read.option("PREfersdecimaL", "true").json(records)
 assert(df2.schema == schema)
   }
+
+  test("SPARK-18352: Parse normal multi-line JSON files (compressed)") {
+withTempDir { dir =>
+  dir.delete()
+  val path = dir.getCanonicalPath
+  primitiveFieldAndType
+.toDF("value")
+.write
+.option("compression", "GzIp")
+.text(path)
+
+  new File(path).listFiles() match {
+case compressedFiles =>
+  assert(compressedFiles.exists(_.getName.endsWith(".gz")))
+  }
+
+  val jsonDF = spark.read.option("wholeFile", true).json(path)
+  val jsonDir = new File(dir, "json").getCanonicalPath
+  jsonDF.coalesce(1).write
+.format("json")
+.option("compression", "gZiP")
+.save(jsonDir)
+
+  new File(jsonDir).listFiles() match {
+case compressedFiles =>
+  assert(compressedFiles.exists(_.getName.endsWith(".json.gz")))
+  }
+
+  val jsonCopy = spark.read
+.format("json")
+.load(jsonDir)
+
+  assert(jsonCopy.count === jsonDF.count)
+  val jsonCopySome = jsonCopy.selectExpr("string", "long", "boolean")
+  val jsonDFSome = jsonDF.selectExpr("string", "long", "boolean")
+  checkAnswer(jsonCopySome, jsonDFSome)
+}
+  }
+
+  test("SPARK-18352: Parse normal multi-line JSON files (uncompressed)") {
+withTempDir { dir =>
+  dir.delete()
+  val path = dir.getCanonicalPath
+  primitiveFieldAndType
+.toDF("value")
+.write
+.text(path)
+
+  val jsonDF = spark.read.option("wholeFile", true).json(path)
+  val jsonDir = new File(dir, "json").getCanonicalPath
+  jsonDF.coalesce(1).write
+.format("json")
+.save(jsonDir)
+
+  val compressedFiles = new File(jsonDir).listFiles()
+  assert(compressedFiles.exists(_.getName.endsWith(".json")))
+
+  val jsonCopy = spark.read
+.format("json")
+.load(jsonDir)
+
+  assert(jsonCopy.count === jsonDF.count)
+  val jsonCopySome = jsonCopy.selectExpr("string", "long", "boolean")
+  val jsonDFSome = jsonDF.selectExpr("string", "long", "boolean")
+  checkAnswer(jsonCopySome, jsonDFSome)
+}
+  }
+
+  test("SPARK-18352: Expect one JSON document per file") {
+// the json parser terminates as soon as it sees a matching END_OBJECT 
or END_ARRAY token.
+// this might not be the optimal behavior but this test verifies that 
only the first value
+// is parsed and the rest are discarded.
+
+// alternatively the parser could continue parsing following objects, 
which may further reduce
+// allocations by skipping the line reader entirely
+
+withTempDir { dir =>
+  dir.delete()
+  val path = dir.getCanonicalPath
+  primitiveFieldAndType
+.flatMap(Iterator.fill(3)(_) ++ Iterator("\n{invalid}"))
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread NathanHowell
Github user NathanHowell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100097749
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
 ---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.json
+
+import java.io.InputStream
+
+import scala.reflect.ClassTag
+
+import com.fasterxml.jackson.core.{JsonFactory, JsonParser}
+import com.google.common.io.ByteStreams
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io.{LongWritable, Text}
+import org.apache.hadoop.mapreduce.Job
+import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat, 
TextInputFormat}
+
+import org.apache.spark.TaskContext
+import org.apache.spark.input.{PortableDataStream, StreamInputFormat}
+import org.apache.spark.rdd.{BinaryFileRDD, RDD}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.json.{CreateJacksonParser, 
JacksonParser, JSONOptions}
+import org.apache.spark.sql.execution.datasources.{CodecStreams, 
HadoopFileLinesReader, PartitionedFile}
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.Utils
+
+/**
+ * Common functions for parsing JSON files
+ * @tparam T A datatype containing the unparsed JSON, such as [[Text]] or 
[[String]]
+ */
+abstract class JsonDataSource[T] extends Serializable {
+  def isSplitable: Boolean
+
+  /**
+   * Parse a [[PartitionedFile]] into 0 or more [[InternalRow]] instances
+   */
+  def readFile(
+conf: Configuration,
+file: PartitionedFile,
+parser: JacksonParser): Iterator[InternalRow]
+
+  /**
+   * Create an [[RDD]] that handles the preliminary parsing of [[T]] 
records
+   */
+  protected def createBaseRdd(
+sparkSession: SparkSession,
+inputPaths: Seq[FileStatus]): RDD[T]
+
+  /**
+   * A generic wrapper to invoke the correct [[JsonFactory]] method to 
allocate a [[JsonParser]]
+   * for an instance of [[T]]
+   */
+  def createParser(jsonFactory: JsonFactory, value: T): JsonParser
+
+  final def infer(
+  sparkSession: SparkSession,
+  inputPaths: Seq[FileStatus],
+  parsedOptions: JSONOptions): Option[StructType] = {
+if (inputPaths.nonEmpty) {
+  val jsonSchema = InferSchema.infer(
+createBaseRdd(sparkSession, inputPaths),
+parsedOptions,
+createParser)
+  checkConstraints(jsonSchema)
+  Some(jsonSchema)
+} else {
+  None
+}
+  }
+
+  /** Constraints to be imposed on schema to be stored. */
+  private def checkConstraints(schema: StructType): Unit = {
+if (schema.fieldNames.length != schema.fieldNames.distinct.length) {
+  val duplicateColumns = schema.fieldNames.groupBy(identity).collect {
+case (x, ys) if ys.length > 1 => "\"" + x + "\""
+  }.mkString(", ")
+  throw new AnalysisException(s"Duplicate column(s) : 
$duplicateColumns found, " +
+s"cannot save to JSON format")
+}
+  }
+}
+
+object JsonDataSource {
+  def apply(options: JSONOptions): JsonDataSource[_] = {
+if (options.wholeFile) {
+  WholeFileJsonDataSource
+} else {
+  TextInputJsonDataSource
+}
+  }
+
+  /**
+   * Create a new [[RDD]] via the supplied callback if there is at least 
one file to process,
+   * otherwise an [[org.apache.spark.rdd.EmptyRDD]] will be returned.
+   */
+  def createBaseRddConf[T : ClassTag](
--- End diff --

Habit from working with languages that don't support overloading, I'll 
change this


---
If your project is set up for it, you 

[GitHub] spark pull request #16855: [SPARK-13931] Resolve stage hanging up problem in...

2017-02-08 Thread GavinGavinNo1
GitHub user GavinGavinNo1 reopened a pull request:

https://github.com/apache/spark/pull/16855

[SPARK-13931] Resolve stage hanging up problem in a particular case

## What changes were proposed in this pull request?
When function 'executorLost' is invoked in class 'TaskSetManager', it's 
significant to judge whether variable 'isZombie' is set to true.

This pull request fixes the following hang:

1.Open speculation switch in the application.
2.Run this app and suppose last task of shuffleMapStage 1 finishes. Let's 
get the record straight, from the eyes of DAG, this stage really finishes, and 
from the eyes of TaskSetManager, variable 'isZombie' is set to true, but 
variable runningTasksSet isn't empty because of speculation.
3.Suddenly, executor 3 is lost. TaskScheduler receiving this signal, 
invokes all executorLost functions of rootPool's taskSetManagers. DAG receiving 
this signal, removes all this executor's outputLocs.
4.TaskSetManager adds all this executor's tasks to pendingTasks and tells 
DAG they will be resubmitted (Attention: possibly not on time).
5.DAG starts to submit a new waitingStage, let's say shuffleMapStage 2, and 
going to find that shuffleMapStage 1 is its missing parent because some 
outputLocs are removed due to executor lost. Then DAG submits shuffleMapStage 1 
again.
6.DAG still receives Task 'Resubmitted' signal from old taskSetManager, and 
increases the number of pendingTasks of shuffleMapStage 1 each time. However, 
old taskSetManager won't resolve new task to submit because its variable 
'isZombie' is set to true.
7.Finally shuffleMapStage 1 never finishes in DAG together with all stages 
depending on it.

## How was this patch tested?

It's quite difficult to construct test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GavinGavinNo1/spark resolve-stage-blocked2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16855.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16855


commit e15b2abedb6fcaf6bac8775f15bdd246fa22902e
Author: GavinGavinNo1 
Date:   2017-02-08T14:51:59Z

Resolve stage hanging up problem in a particular case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16857
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16857: [SPARK-19517][SS] KafkaSource fails to initialize...

2017-02-08 Thread vitillo
GitHub user vitillo opened a pull request:

https://github.com/apache/spark/pull/16857

[SPARK-19517][SS] KafkaSource fails to initialize partition offsets

## What changes were proposed in this pull request?

This patch fixes a bug in `KafkaSource` with the (de)serialization of the 
length of the JSON string that contains the initial partition offsets.

## How was this patch tested?

I ran the test suite for spark-sql-kafka-0-10.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vitillo/spark kafka_source_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16857.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16857


commit b2523b920de2329878a37f7efc1e9dda5d969b79
Author: Roberto Agostino Vitillo 
Date:   2017-02-08T15:07:40Z

Fix (de)serialization of initial partition offsets.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16736
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16736
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72591/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16736
  
**[Test build #72591 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72591/testReport)**
 for PR 16736 at commit 
[`314f6f8`](https://github.com/apache/spark/commit/314f6f8de6990b1c3bfddea503490a1797e25117).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...

2017-02-08 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16776#discussion_r100089611
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
*   Note that values greater than 1 are accepted but give the same 
result as 1.
* @return the approximate quantiles at the given probabilities
*
-   * @note NaN values will be removed from the numerical column before 
calculation
+   * @note null and NaN values will be removed from the numerical column 
before calculation
*
* @since 2.0.0
*/
   def approxQuantile(
   col: String,
   probabilities: Array[Double],
   relativeError: Double): Array[Double] = {
-StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(),
-  Seq(col), probabilities, relativeError).head.toArray
+val res = approxQuantile(Array(col), probabilities, relativeError)
+if (res != null) {
+  res.head
+} else {
+  null
+}
   }
 
   /**
* Calculates the approximate quantiles of numerical columns of a 
DataFrame.
-   * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* 
approxQuantile]] for
-   * detailed description.
+   * @see `DataFrameStatsFunctions.approxQuantile` for detailed 
description.
*
-   * Note that rows containing any null or NaN values values will be 
removed before
-   * calculation.
* @param cols the names of the numerical columns
* @param probabilities a list of quantile probabilities
*   Each number must belong to [0, 1].
*   For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
-   * @param relativeError The relative target precision to achieve (>= 0).
+   * @param relativeError The relative target precision to achieve 
(greater or equal to 0).
*   If set to zero, the exact quantiles are computed, which could be 
very expensive.
*   Note that values greater than 1 are accepted but give the same 
result as 1.
* @return the approximate quantiles at the given probabilities of each 
column
*
-   * @note Rows containing any NaN values will be removed before 
calculation
+   * @note Rows containing any null or NaN values will be removed before 
calculation
*
* @since 2.2.0
*/
   def approxQuantile(
   cols: Array[String],
   probabilities: Array[Double],
   relativeError: Double): Array[Array[Double]] = {
-StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): 
_*).na.drop(), cols,
-  probabilities, relativeError).map(_.toArray).toArray
+try {
+  StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): 
_*).na.drop(), cols,
--- End diff --

Originally there was never any na dropping in `approxQuantile` as far as I 
can recall. That was added in #14858. cc @srowen 

You could also simply change the na dropping to only drop from the cols 
passed as args for each version?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16856
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72590/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...

2017-02-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16856#discussion_r100089351
  
--- Diff: docs/programming-guide.md ---
@@ -77,9 +76,9 @@ In addition, if you wish to access an HDFS cluster, you 
need to add a dependency
 Finally, you need to import some Spark classes into your program. Add the 
following lines:
 
 {% highlight scala %}
-import org.apache.spark.api.java.JavaSparkContext
-import org.apache.spark.api.java.JavaRDD
-import org.apache.spark.SparkConf
+import org.apache.spark.api.java.JavaSparkContext;
--- End diff --

You don't want semicolons in Scala right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16856
  
**[Test build #72590 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72590/testReport)**
 for PR 16856 at commit 
[`18d6daa`](https://github.com/apache/spark/commit/18d6daa4bc08c265a3984b676cefacc377f72b74).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16856
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...

2017-02-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16856#discussion_r100089566
  
--- Diff: docs/programming-guide.md ---
@@ -244,13 +239,13 @@ use IPython, set the `PYSPARK_DRIVER_PYTHON` variable 
to `ipython` when running
 $ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark
 {% endhighlight %}
 
-To use the Jupyter notebook (previously known as the IPython notebook), 
--- End diff --

Several extraneous whitespace changes but whatever


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16736: [SPARK-19265][SQL][Follow-up] Configurable `table...

2017-02-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16736#discussion_r100089378
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfEntrySuite.scala 
---
@@ -164,6 +164,18 @@ class SQLConfEntrySuite extends SparkFunSuite {
 assert(conf.getConf(confEntry) === Some("a"))
   }
 
+  test("checkValue()") {
--- End diff --

ah you're quite correct! let me update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16856
  
**[Test build #72590 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72590/testReport)**
 for PR 16856 at commit 
[`18d6daa`](https://github.com/apache/spark/commit/18d6daa4bc08c265a3984b676cefacc377f72b74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16736
  
**[Test build #72591 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72591/testReport)**
 for PR 16736 at commit 
[`314f6f8`](https://github.com/apache/spark/commit/314f6f8de6990b1c3bfddea503490a1797e25117).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16736: [SPARK-19265][SQL][Follow-up] Configurable `table...

2017-02-08 Thread lw-lin
Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16736#discussion_r100089218
  
--- Diff: 
core/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala ---
@@ -128,6 +128,25 @@ class ConfigEntrySuite extends SparkFunSuite {
 assert(conf.get(transformationConf) === "bar")
   }
 
+  test("conf entry: checkValue()") {
+def createConf(default: Int): ConfigEntry[Int] =
+  ConfigBuilder(testKey("checkValue"))
+.intConf
+.checkValue(value => value >= 0, "value must be non-negative")
+.createWithDefault(default)
+
+// this succeeds
+val conf = createConf(10)
+
+// this fails because valueConverter() calls checkValue()
+val e1 = intercept[IllegalArgumentException] { 
conf.valueConverter("-1") }
--- End diff --

sure. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16856
  
cc @sameeragarwal @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...

2017-02-08 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/16856

[SPARK-19516][DOC] update public doc to use SparkSession instead of 
SparkContext

## What changes were proposed in this pull request?

After Spark 2.0, `SparkSession` becomes the new entry point of Spark 
applications. We should update the public documents to reflect this.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16856


commit 18d6daa4bc08c265a3984b676cefacc377f72b74
Author: Wenchen Fan 
Date:   2017-02-08T15:18:46Z

update public doc to use SparkSession instead of SparkContext




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72589/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72589 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72589/testReport)**
 for PR 16787 at commit 
[`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72589 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72589/testReport)**
 for PR 16787 at commit 
[`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16787
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX][test-hadoop2.6] Add back mo...

2017-02-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16853
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16854
  
**[Test build #72588 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72588/testReport)**
 for PR 16854 at commit 
[`eabb3f3`](https://github.com/apache/spark/commit/eabb3f3f83da2d74cb24bf483639c85f7466a56e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class UnivocityParser(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16854
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72588/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hi...

2017-02-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16848


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16854
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16855: [SPARK-13931] Resolve stage hanging up problem in...

2017-02-08 Thread GavinGavinNo1
Github user GavinGavinNo1 closed the pull request at:

https://github.com/apache/spark/pull/16855


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16854
  
**[Test build #72588 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72588/testReport)**
 for PR 16854 at commit 
[`eabb3f3`](https://github.com/apache/spark/commit/eabb3f3f83da2d74cb24bf483639c85f7466a56e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16855: [SPARK-13931] Resolve stage hanging up problem in a part...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16855
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hive Serd...

2017-02-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16848
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...

2017-02-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16854
  
Let me try to add Java one and fix comments more tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16804
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16804
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72587/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16855: [SPARK-13931] Resolve stage hanging up problem in...

2017-02-08 Thread GavinGavinNo1
GitHub user GavinGavinNo1 opened a pull request:

https://github.com/apache/spark/pull/16855

[SPARK-13931] Resolve stage hanging up problem in a particular case

## What changes were proposed in this pull request?
When function 'executorLost' is invoked in class 'TaskSetManager', it's 
significant to judge whether variable 'isZombie' is set to true.

This pull request fixes the following hang:

1.Open speculation switch in the application.
2.Run this app and suppose last task of shuffleMapStage 1 finishes. Let's 
get the record straight, from the eyes of DAG, this stage really finishes, and 
from the eyes of TaskSetManager, variable 'isZombie' is set to true, but 
variable runningTasksSet isn't empty because of speculation.
3.Suddenly, executor 3 is lost. TaskScheduler receiving this signal, 
invokes all executorLost functions of rootPool's taskSetManagers. DAG receiving 
this signal, removes all this executor's outputLocs.
4.TaskSetManager adds all this executor's tasks to pendingTasks and tells 
DAG they will be resubmitted (Attention: possibly not on time).
5.DAG starts to submit a new waitingStage, let's say shuffleMapStage 2, and 
going to find that shuffleMapStage 1 is its missing parent because some 
outputLocs are removed due to executor lost. Then DAG submits shuffleMapStage 1 
again.
6.DAG still receives Task 'Resubmitted' signal from old taskSetManager, and 
increases the number of pendingTasks of shuffleMapStage 1 each time. However, 
old taskSetManager won't resolve new task to submit because its variable 
'isZombie' is set to true.
7.Finally shuffleMapStage 1 never finishes in DAG together with all stages 
depending on it.

## How was this patch tested?

It's quite difficult to construct test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GavinGavinNo1/spark resolve-stage-blocked2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16855.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16855


commit e15b2abedb6fcaf6bac8775f15bdd246fa22902e
Author: GavinGavinNo1 
Date:   2017-02-08T14:51:59Z

Resolve stage hanging up problem in a particular case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16804
  
**[Test build #72587 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72587/testReport)**
 for PR 16804 at commit 
[`e7ca0ea`](https://github.com/apache/spark/commit/e7ca0ead843f2c9650e690fe649be18fa6389e48).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...

2017-02-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16776#discussion_r100086037
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
@@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
*   Note that values greater than 1 are accepted but give the same 
result as 1.
* @return the approximate quantiles at the given probabilities
*
-   * @note NaN values will be removed from the numerical column before 
calculation
+   * @note null and NaN values will be removed from the numerical column 
before calculation
*
* @since 2.0.0
*/
   def approxQuantile(
   col: String,
   probabilities: Array[Double],
   relativeError: Double): Array[Double] = {
-StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(),
-  Seq(col), probabilities, relativeError).head.toArray
+val res = approxQuantile(Array(col), probabilities, relativeError)
+if (res != null) {
+  res.head
+} else {
+  null
+}
   }
 
   /**
* Calculates the approximate quantiles of numerical columns of a 
DataFrame.
-   * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* 
approxQuantile]] for
-   * detailed description.
+   * @see `DataFrameStatsFunctions.approxQuantile` for detailed 
description.
*
-   * Note that rows containing any null or NaN values values will be 
removed before
-   * calculation.
* @param cols the names of the numerical columns
* @param probabilities a list of quantile probabilities
*   Each number must belong to [0, 1].
*   For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
-   * @param relativeError The relative target precision to achieve (>= 0).
+   * @param relativeError The relative target precision to achieve 
(greater or equal to 0).
*   If set to zero, the exact quantiles are computed, which could be 
very expensive.
*   Note that values greater than 1 are accepted but give the same 
result as 1.
* @return the approximate quantiles at the given probabilities of each 
column
*
-   * @note Rows containing any NaN values will be removed before 
calculation
+   * @note Rows containing any null or NaN values will be removed before 
calculation
*
* @since 2.2.0
*/
   def approxQuantile(
   cols: Array[String],
   probabilities: Array[Double],
   relativeError: Double): Array[Array[Double]] = {
-StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): 
_*).na.drop(), cols,
-  probabilities, relativeError).map(_.toArray).toArray
+try {
+  StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): 
_*).na.drop(), cols,
--- End diff --

@zhengruifeng Sure. If we want to make them consistent, I am fine. How 
about reverting https://github.com/apache/spark/pull/12135 at first? At the 
same time, we can work on the new solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16804
  
**[Test build #72587 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72587/testReport)**
 for PR 16804 at commit 
[`e7ca0ea`](https://github.com/apache/spark/commit/e7ca0ead843f2c9650e690fe649be18fa6389e48).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16854: [SPARK-15463][SQL] Add an API to load DataFrame f...

2017-02-08 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16854

[SPARK-15463][SQL] Add an API to load DataFrame from Dataset[String]

## What changes were proposed in this pull request?

This PR proposes to add an API that loads `DataFrame` from 
`Dataset[String]`.

It allows pre-processing before loading into CSV, which means allowing a 
lot of workarounds for many narrow cases.

- Case 1 - pre-processing

  ```scala
  val df = spark.read.text("...")
  // Pre-processing with this.
  spark.read.csv(df.as[String])
  ```

- Case 2 - use other input formats

  ```scala
  val rdd = spark.sparkContext.newAPIHadoopFile("/file.csv.lzo",
classOf[com.hadoop.mapreduce.LzoTextInputFormat],
classOf[org.apache.hadoop.io.LongWritable],
classOf[org.apache.hadoop.io.Text])

  spark.read.csv(rdd.toDS)
  ```

## How was this patch tested?

Added tests in `CSVSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-15463

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16854.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16854


commit eabb3f3f83da2d74cb24bf483639c85f7466a56e
Author: hyukjinkwon 
Date:   2017-02-08T14:46:55Z

Add an API to load DataFrame from Dataset[String]




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16804: [SPARK-19459][SQL] Add Hive datatype (char/varcha...

2017-02-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16804#discussion_r100085484
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala ---
@@ -162,6 +162,40 @@ abstract class OrcSuite extends QueryTest with 
TestHiveSingleton with BeforeAndA
   hiveClient.runSqlHive("DROP TABLE IF EXISTS orc_varchar")
 }
   }
+
+  test("SPARK-19459: read char/varchar column written by Hive") {
+val hiveClient = 
spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
+val location = Utils.createTempDir().toURI
+try {
+  hiveClient.runSqlHive(
+"""
+   |CREATE EXTERNAL TABLE hive_orc(
+   |  a STRING,
+   |  b CHAR(10),
+   |  c VARCHAR(10))
+   |STORED AS orc""".stripMargin)
+  // Hive throws an exception if I assign the location in the create 
table statment.
+  hiveClient.runSqlHive(
+s"ALTER TABLE hive_orc SET LOCATION '$location'")
+  hiveClient.runSqlHive(
+"INSERT INTO TABLE hive_orc SELECT 'a', 'b', 'c' FROM (SELECT 1) 
t")
+
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...

2017-02-08 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16689
  
@felixcheung @titicaca Just to make sure I understand, collect on timestamp 
was getting `c("POSIXct", "POSIXt")` even before this change ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100083481
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1764,4 +1769,125 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 val df2 = spark.read.option("PREfersdecimaL", "true").json(records)
 assert(df2.schema == schema)
   }
+
+  test("SPARK-18352: Parse normal multi-line JSON files (compressed)") {
+withTempDir { dir =>
+  dir.delete()
+  val path = dir.getCanonicalPath
+  primitiveFieldAndType
+.toDF("value")
+.write
+.option("compression", "GzIp")
+.text(path)
+
+  new File(path).listFiles() match {
+case compressedFiles =>
+  assert(compressedFiles.exists(_.getName.endsWith(".gz")))
+  }
+
+  val jsonDF = spark.read.option("wholeFile", true).json(path)
+  val jsonDir = new File(dir, "json").getCanonicalPath
+  jsonDF.coalesce(1).write
+.format("json")
+.option("compression", "gZiP")
+.save(jsonDir)
+
+  new File(jsonDir).listFiles() match {
+case compressedFiles =>
+  assert(compressedFiles.exists(_.getName.endsWith(".json.gz")))
+  }
+
+  val jsonCopy = spark.read
+.format("json")
+.load(jsonDir)
+
+  assert(jsonCopy.count === jsonDF.count)
+  val jsonCopySome = jsonCopy.selectExpr("string", "long", "boolean")
+  val jsonDFSome = jsonDF.selectExpr("string", "long", "boolean")
+  checkAnswer(jsonCopySome, jsonDFSome)
+}
+  }
+
+  test("SPARK-18352: Parse normal multi-line JSON files (uncompressed)") {
+withTempDir { dir =>
+  dir.delete()
+  val path = dir.getCanonicalPath
+  primitiveFieldAndType
+.toDF("value")
+.write
+.text(path)
+
+  val jsonDF = spark.read.option("wholeFile", true).json(path)
+  val jsonDir = new File(dir, "json").getCanonicalPath
+  jsonDF.coalesce(1).write
+.format("json")
+.save(jsonDir)
+
+  val compressedFiles = new File(jsonDir).listFiles()
+  assert(compressedFiles.exists(_.getName.endsWith(".json")))
+
+  val jsonCopy = spark.read
+.format("json")
+.load(jsonDir)
+
+  assert(jsonCopy.count === jsonDF.count)
+  val jsonCopySome = jsonCopy.selectExpr("string", "long", "boolean")
+  val jsonDFSome = jsonDF.selectExpr("string", "long", "boolean")
+  checkAnswer(jsonCopySome, jsonDFSome)
+}
+  }
+
+  test("SPARK-18352: Expect one JSON document per file") {
+// the json parser terminates as soon as it sees a matching END_OBJECT 
or END_ARRAY token.
+// this might not be the optimal behavior but this test verifies that 
only the first value
+// is parsed and the rest are discarded.
+
+// alternatively the parser could continue parsing following objects, 
which may further reduce
+// allocations by skipping the line reader entirely
+
+withTempDir { dir =>
+  dir.delete()
+  val path = dir.getCanonicalPath
+  primitiveFieldAndType
+.flatMap(Iterator.fill(3)(_) ++ Iterator("\n{invalid}"))
--- End diff --

can we write json string literal to text file? it's hard to understand 
what's going on here...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16831: [SPARK-19263] Fix race in SchedulerIntegrationSuite.

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16831
  
**[Test build #3562 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3562/testReport)**
 for PR 16831 at commit 
[`67fe5df`](https://github.com/apache/spark/commit/67fe5dfe9d00c628c15078d8d99c5b0de3962946).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100082372
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1764,4 +1769,125 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 val df2 = spark.read.option("PREfersdecimaL", "true").json(records)
 assert(df2.schema == schema)
   }
+
+  test("SPARK-18352: Parse normal multi-line JSON files (compressed)") {
+withTempDir { dir =>
+  dir.delete()
--- End diff --

looks like you need `withTempPath`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100081170
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
 ---
@@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.json
+
+import java.io.InputStream
+
+import scala.reflect.ClassTag
+
+import com.fasterxml.jackson.core.{JsonFactory, JsonParser}
+import com.google.common.io.ByteStreams
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io.{LongWritable, Text}
+import org.apache.hadoop.mapreduce.Job
+import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat, 
TextInputFormat}
+
+import org.apache.spark.TaskContext
+import org.apache.spark.input.{PortableDataStream, StreamInputFormat}
+import org.apache.spark.rdd.{BinaryFileRDD, RDD}
+import org.apache.spark.sql.{AnalysisException, SparkSession}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.json.{CreateJacksonParser, 
JacksonParser, JSONOptions}
+import org.apache.spark.sql.execution.datasources.{CodecStreams, 
HadoopFileLinesReader, PartitionedFile}
+import org.apache.spark.sql.types.StructType
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.Utils
+
+/**
+ * Common functions for parsing JSON files
+ * @tparam T A datatype containing the unparsed JSON, such as [[Text]] or 
[[String]]
+ */
+abstract class JsonDataSource[T] extends Serializable {
+  def isSplitable: Boolean
+
+  /**
+   * Parse a [[PartitionedFile]] into 0 or more [[InternalRow]] instances
+   */
+  def readFile(
+conf: Configuration,
+file: PartitionedFile,
+parser: JacksonParser): Iterator[InternalRow]
+
+  /**
+   * Create an [[RDD]] that handles the preliminary parsing of [[T]] 
records
+   */
+  protected def createBaseRdd(
+sparkSession: SparkSession,
+inputPaths: Seq[FileStatus]): RDD[T]
+
+  /**
+   * A generic wrapper to invoke the correct [[JsonFactory]] method to 
allocate a [[JsonParser]]
+   * for an instance of [[T]]
+   */
+  def createParser(jsonFactory: JsonFactory, value: T): JsonParser
+
+  final def infer(
+  sparkSession: SparkSession,
+  inputPaths: Seq[FileStatus],
+  parsedOptions: JSONOptions): Option[StructType] = {
+if (inputPaths.nonEmpty) {
+  val jsonSchema = InferSchema.infer(
+createBaseRdd(sparkSession, inputPaths),
+parsedOptions,
+createParser)
+  checkConstraints(jsonSchema)
+  Some(jsonSchema)
+} else {
+  None
+}
+  }
+
+  /** Constraints to be imposed on schema to be stored. */
+  private def checkConstraints(schema: StructType): Unit = {
+if (schema.fieldNames.length != schema.fieldNames.distinct.length) {
+  val duplicateColumns = schema.fieldNames.groupBy(identity).collect {
+case (x, ys) if ys.length > 1 => "\"" + x + "\""
+  }.mkString(", ")
+  throw new AnalysisException(s"Duplicate column(s) : 
$duplicateColumns found, " +
+s"cannot save to JSON format")
+}
+  }
+}
+
+object JsonDataSource {
+  def apply(options: JSONOptions): JsonDataSource[_] = {
+if (options.wholeFile) {
+  WholeFileJsonDataSource
+} else {
+  TextInputJsonDataSource
+}
+  }
+
+  /**
+   * Create a new [[RDD]] via the supplied callback if there is at least 
one file to process,
+   * otherwise an [[org.apache.spark.rdd.EmptyRDD]] will be returned.
+   */
+  def createBaseRddConf[T : ClassTag](
--- End diff --

why call it `createBaseRddConf` instead of `createBaseRdd`?


---
If your project is set up for it, you can reply to this email and 

[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX][test-hadoop2.6] Add back mo...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16853
  
**[Test build #3564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3564/testReport)**
 for PR 16853 at commit 
[`c791fdb`](https://github.com/apache/spark/commit/c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16850
  
**[Test build #72586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72586/testReport)**
 for PR 16850 at commit 
[`5025cb7`](https://github.com/apache/spark/commit/5025cb7511a43e24cb3a181eb7b06c69b024479f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-08 Thread nsyca
Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16841#discussion_r100076749
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-having.sql.out
 ---
@@ -0,0 +1,217 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 12
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:00:00.000', date '2014-04-04'),
+  ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 
01:02:00.001', date '2014-06-04'),
+  ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:02:00.001', date '2014-05-05'),
+  ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', null),
+  ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 
01:02:00.001', null),
+  ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-04'),
+  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 
01:02:00.001', date '2014-09-04'),
+  ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:02:00.001', date '2014-04-04'),
+  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04')
+  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:01:00.000', date '2014-04-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 
01:01:00.000', date '2016-05-04'),
+  ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 
01:01:00.000', null),
+  ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-05'),
+  ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:01:00.000', date '2014-09-04'),
+  ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:01:00.000', date '2014-10-04'),
+  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', null)
+  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+  ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:02:00.000', date '2014-04-04'),
+  ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:02:00.000', date '2014-06-04'),
+  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:02:00.000', date '2014-07-04'),
+  ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:02:00.000', date '2014-08-04'),
+  ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:02:00.000', date '2014-09-05'),
+  ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:02:00.000', null),
+  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 
01:02:00.000', null),
+  ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:02:00.000', date '2015-05-04')
+  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 schema

[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-08 Thread nsyca
Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16841#discussion_r100077204
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out
 ---
@@ -0,0 +1,353 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 14
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:00:00.000', date '2014-04-04'),
+  ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 
01:02:00.001', date '2014-06-04'),
+  ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:02:00.001', date '2014-05-05'),
+  ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', null),
+  ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 
01:02:00.001', null),
+  ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-04'),
+  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 
01:02:00.001', date '2014-09-04'),
+  ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:02:00.001', date '2014-04-04'),
+  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04')
+  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:01:00.000', date '2014-04-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 
01:01:00.000', date '2016-05-04'),
+  ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 
01:01:00.000', null),
+  ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-05'),
+  ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:01:00.000', date '2014-09-04'),
+  ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:01:00.000', date '2014-10-04'),
+  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', null)
+  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+  ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:02:00.000', date '2014-04-04'),
+  ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:02:00.000', date '2014-06-04'),
+  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:02:00.000', date '2014-07-04'),
+  ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:02:00.000', date '2014-08-04'),
+  ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:02:00.000', date '2014-09-05'),
+  ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:02:00.000', null),
+  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 
01:02:00.000', null),
+  ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:02:00.000', date '2015-05-04')
+  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 schema

[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16850
  
**[Test build #3563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3563/testReport)**
 for PR 16850 at commit 
[`5025cb7`](https://github.com/apache/spark/commit/5025cb7511a43e24cb3a181eb7b06c69b024479f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-08 Thread nsyca
Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16841#discussion_r100077423
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-multiple-columns.sql.out
 ---
@@ -0,0 +1,178 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 8
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:00:00.000', date '2014-04-04'),
+  ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 
01:02:00.001', date '2014-06-04'),
+  ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:02:00.001', date '2014-05-05'),
+  ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', null),
+  ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 
01:02:00.001', null),
+  ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-04'),
+  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 
01:02:00.001', date '2014-09-04'),
+  ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:02:00.001', date '2014-04-04'),
+  ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04')
+  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:01:00.000', date '2014-04-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 
01:01:00.000', date '2016-05-04'),
+  ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 
01:01:00.000', null),
+  ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-05'),
+  ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:01:00.000', date '2014-09-04'),
+  ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:01:00.000', date '2014-10-04'),
+  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', null)
+  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+  ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:02:00.000', date '2014-04-04'),
+  ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:02:00.000', date '2014-06-04'),
+  ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:02:00.000', date '2014-07-04'),
+  ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:02:00.000', date '2014-08-04'),
+  ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:02:00.000', date '2014-09-05'),
+  ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:02:00.000', null),
+  ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 
01:02:00.000', null),
+  ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:02:00.000', date '2015-05-04')
+  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 

[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...

2017-02-08 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/16850
  
jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16760
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16760
  
**[Test build #72585 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72585/testReport)**
 for PR 16760 at commit 
[`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16760
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72585/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16760
  
**[Test build #72585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72585/testReport)**
 for PR 16760 at commit 
[`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread nsyca
Github user nsyca commented on the issue:

https://github.com/apache/spark/pull/16760
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72584/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72584 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72584/testReport)**
 for PR 16787 at commit 
[`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72584/testReport)**
 for PR 16787 at commit 
[`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100070021
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -463,117 +459,6 @@ private[spark] object HiveUtils extends Logging {
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
 
-  /** Converts the native StructField to Hive's FieldSchema. */
-  private def toHiveColumn(c: StructField): FieldSchema = {
-val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
-  c.metadata.getString(HiveUtils.hiveTypeString)
-} else {
-  c.dataType.catalogString
-}
-new FieldSchema(c.name, typeString, c.getComment.orNull)
-  }
-
-  /** Builds the native StructField from Hive's FieldSchema. */
-  private def fromHiveColumn(hc: FieldSchema): StructField = {
-val columnType = try {
-  CatalystSqlParser.parseDataType(hc.getType)
-} catch {
-  case e: ParseException =>
-throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
-}
-
-val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
-val field = StructField(
-  name = hc.getName,
-  dataType = columnType,
-  nullable = true,
-  metadata = metadata)
-Option(hc.getComment).map(field.withComment).getOrElse(field)
-  }
-
-  // TODO: merge this with HiveClientImpl#toHiveTable
-  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
-  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
--- End diff --

this method has been deleted, and use HiveClientImpl.toHiveTable which use 
shim to set location. In HiveClientImpl, the hive version maybe not same with 
the default hive(1.2.1),  so  it use run time shim to setDataLocation. while 
here deleted HiveUtils.toHiveTable just for runtime hive execution not to 
interact with metastore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16831: [SPARK-19263] Fix race in SchedulerIntegrationSuite.

2017-02-08 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16831
  
@squito 
Many thanks for your help.  You are so kind person : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16831: [SPARK-19263] Fix race in SchedulerIntegrationSuite.

2017-02-08 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/16831
  
@jinxing64 that way of testing is fine, but I find its much faster to use 
sbt.

http://www.scala-sbt.org/0.13/docs/Testing.html

```
build/sbt -Pyarn -Phadoop-2.6 -Phive-thriftserver -Dhadoop.version=2.6.5
[this will put you in an sbt console]
> project core
> testOnly *DAGSchedulerSuite
[run all tests that match the pattern -- in this case, one suite]
> testOnly *spark.scheduler.*
[this time we run everything in the scheduler package]
>~testOnly *DAGSchedulerSuite
[the '~' in front means that as we modify the code (eg. in another terminal 
or an IDE), sbt will re-run the tests everytime the source changes.]
>~testOnly *DAGSchedulerSuite -- -z "SPARK-12345"
[as above, but only run tests within that suite whose name matches the 
pattern]
```

The last variant is the quickest way for me run one test repeatedly as I'm 
developing.  Because it runs everytime I save changes to disk, it often runs 
when my code is in some bad state and everything fails.  But no big deal, it 
just runs again when I fix things, so I ignore the window with the running 
tests until I think I have things in an OK state.

some more description of the arguments to scalatest itself (eg `-z` 
http://www.scalatest.org/user_guide/using_the_runner)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hive Serd...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16848
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hive Serd...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16848
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72577/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hive Serd...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16848
  
**[Test build #72577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72577/testReport)**
 for PR 16848 at commit 
[`1146f26`](https://github.com/apache/spark/commit/1146f2676e57ac412acdea9b3ea4619194bedb4b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep i...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16853
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep i...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16853
  
**[Test build #72583 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72583/testReport)**
 for PR 16853 at commit 
[`c791fdb`](https://github.com/apache/spark/commit/c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep i...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16853
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72583/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep i...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16853
  
**[Test build #72583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72583/testReport)**
 for PR 16853 at commit 
[`c791fdb`](https://github.com/apache/spark/commit/c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16837
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16837
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72578/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16837
  
**[Test build #72578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72578/testReport)**
 for PR 16837 at commit 
[`329886e`](https://github.com/apache/spark/commit/329886e54d3a70e2314d67b6b6060fc33cef9b8d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove ...

2017-02-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16810#discussion_r100065275
  
--- Diff: resource-managers/yarn/pom.xml ---
@@ -125,34 +125,12 @@
   test
 
 
- 
-
 
   org.apache.hadoop
   hadoop-yarn-server-tests
   tests
   test
 
-
--- End diff --

Oops, mockito ended up being necessary, though only according to the Maven 
build


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito tes...

2017-02-08 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/16853

[SPARK-19464][BUILD][HOTFIX] Add back mockito test dep in YARN module, as 
it ends up being required in a Maven build

Add back mockito test dep in YARN module, as it ends up being required in a 
Maven build

## How was this patch tested?

PR builder again, but also a local `mvn` run using the command that the 
broken Jenkins job uses

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-19464.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16853.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16853


commit c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6
Author: Sean Owen 
Date:   2017-02-08T13:31:51Z

Add back mockito test dep in YARN module, as it ends up being required in a 
Maven build




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16787
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72580/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...

2017-02-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16787
  
**[Test build #72580 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72580/testReport)**
 for PR 16787 at commit 
[`a3c9f5e`](https://github.com/apache/spark/commit/a3c9f5e4a754ceee2ffb71c3da49221001b1bf2c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100064532
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -298,22 +312,22 @@ class JacksonParser(
 // Here, we pass empty `PartialFunction` so that this case can be
 // handled as a failed conversion. It will throw an exception as
 // long as the value is not null.
-parseJsonToken(parser, dataType)(PartialFunction.empty[JsonToken, 
Any])
+parseJsonToken[AnyRef](parser, 
dataType)(PartialFunction.empty[JsonToken, AnyRef])
   }
 
   /**
* This method skips `FIELD_NAME`s at the beginning, and handles nulls 
ahead before trying
* to parse the JSON token using given function `f`. If the `f` failed 
to parse and convert the
* token, call `failedConversion` to handle the token.
*/
-  private def parseJsonToken(
+  private def parseJsonToken[R >: Null](
--- End diff --

what does `>: Null` mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100064266
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
@@ -227,66 +267,71 @@ class JacksonParser(
   }
 
 case TimestampType =>
-  (parser: JsonParser) => parseJsonToken(parser, dataType) {
+  (parser: JsonParser) => parseJsonToken[java.lang.Long](parser, 
dataType) {
 case VALUE_STRING =>
+  val stringValue = parser.getText
   // This one will lose microseconds parts.
   // See https://issues.apache.org/jira/browse/SPARK-10681.
-  Try(options.timestampFormat.parse(parser.getText).getTime * 
1000L)
-.getOrElse {
-  // If it fails to parse, then tries the way used in 2.0 and 
1.x for backwards
-  // compatibility.
-  DateTimeUtils.stringToTime(parser.getText).getTime * 1000L
-}
+  Long.box {
--- End diff --

I don't think this makes the code more readable...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

2017-02-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16837
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16386#discussion_r100063010
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -160,7 +164,17 @@ public void writeTo(OutputStream out) throws 
IOException {
 throw new ArrayIndexOutOfBoundsException();
   }
 
-  out.write(bytes, (int) arrayOffset, numBytes);
+  return ByteBuffer.wrap(bytes, (int) arrayOffset, numBytes);
+} else {
+  return null;
--- End diff --

will it be more consistent if we return `ByteBuffer.wrap(getBytes)` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ... PART...

2017-02-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16373
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   >