[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16736 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16850 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16850 **[Test build #3563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3563/testReport)** for PR 16850 at commit [`5025cb7`](https://github.com/apache/spark/commit/5025cb7511a43e24cb3a181eb7b06c69b024479f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100104738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -298,22 +312,22 @@ class JacksonParser( // Here, we pass empty `PartialFunction` so that this case can be // handled as a failed conversion. It will throw an exception as // long as the value is not null. -parseJsonToken(parser, dataType)(PartialFunction.empty[JsonToken, Any]) +parseJsonToken[AnyRef](parser, dataType)(PartialFunction.empty[JsonToken, AnyRef]) } /** * This method skips `FIELD_NAME`s at the beginning, and handles nulls ahead before trying * to parse the JSON token using given function `f`. If the `f` failed to parse and convert the * token, call `failedConversion` to handle the token. */ - private def parseJsonToken( + private def parseJsonToken[R >: Null]( --- End diff -- It states that `R` must be a nullable type. This enables `null: R` to compile and is preferable to the runtime cast `null.asInstanceOf[R]` because it is verified at compile time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100103739 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -227,66 +267,71 @@ class JacksonParser( } case TimestampType => - (parser: JsonParser) => parseJsonToken(parser, dataType) { + (parser: JsonParser) => parseJsonToken[java.lang.Long](parser, dataType) { case VALUE_STRING => + val stringValue = parser.getText // This one will lose microseconds parts. // See https://issues.apache.org/jira/browse/SPARK-10681. - Try(options.timestampFormat.parse(parser.getText).getTime * 1000L) -.getOrElse { - // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards - // compatibility. - DateTimeUtils.stringToTime(parser.getText).getTime * 1000L -} + Long.box { --- End diff -- This is needed to satisfy the type checker. The other approach is to explicitly specify the type in two locations: `Try[java.lang.Long](...).getOrElse[java.lang.Long](...)`. I found explicitly boxing to be more readable than the alternative. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16852: [SPARK-19512][SQL] codegen for compare structs fails
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16852 **[Test build #3565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3565/testReport)** for PR 16852 at commit [`9a8d853`](https://github.com/apache/spark/commit/9a8d8537748f38a4276188b3f60f6852010e3387). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16852: [SPARK-19512][SQL] codegen for compare structs fails
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16852 **[Test build #3565 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3565/testReport)** for PR 16852 at commit [`9a8d853`](https://github.com/apache/spark/commit/9a8d8537748f38a4276188b3f60f6852010e3387). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100101464 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -160,7 +164,17 @@ public void writeTo(OutputStream out) throws IOException { throw new ArrayIndexOutOfBoundsException(); } - out.write(bytes, (int) arrayOffset, numBytes); + return ByteBuffer.wrap(bytes, (int) arrayOffset, numBytes); +} else { + return null; --- End diff -- It will allocate an extra object but would simplify the calling code... since it would be a short lived allocation it's probably fine to do this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100100641 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -194,5 +195,8 @@ class PortableDataStream( } def getPath(): String = path + + @Since("2.2.0") --- End diff -- This is a public class so I thought adding a `since` tag would benefit the documentation. If it's not desired I can certainly remove it. As for making the lazy val public vs private: I'm following the style used already in the class. There are public get methods for each private field. I'm not partial to either approach but prefer to keep it consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100099791 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala --- @@ -31,10 +31,17 @@ import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, CompressionCodecs * Most of these map directly to Jackson's internal options, specified in [[JsonParser.Feature]]. */ private[sql] class JSONOptions( -@transient private val parameters: CaseInsensitiveMap) +@transient private val parameters: CaseInsensitiveMap, +defaultColumnNameOfCorruptRecord: String) --- End diff -- Previously the `JSONOptions` instance was always passed around with a `columnNameOfCorruptRecord` value. This just makes it a field in `JSONOptions` instead to put all options in one place. Since it's a required option it made more sense to use a field instead making an entry in the `CaseInsensitiveMap`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11760: [SPARK-13931] Resolve stage hanging up problem in...
Github user GavinGavinNo1 closed the pull request at: https://github.com/apache/spark/pull/11760 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11760: [SPARK-13931] Resolve stage hanging up problem in a part...
Github user GavinGavinNo1 commented on the issue: https://github.com/apache/spark/pull/11760 @kayousterhout I got some problem with git conflict. So I create a new branch and a new pull request. You may refer to https://github.com/apache/spark/pull/16855. And I close this pull request for the time being. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16855: [SPARK-13931] Resolve stage hanging up problem in a part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16855 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100098008 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,125 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { val df2 = spark.read.option("PREfersdecimaL", "true").json(records) assert(df2.schema == schema) } + + test("SPARK-18352: Parse normal multi-line JSON files (compressed)") { +withTempDir { dir => + dir.delete() + val path = dir.getCanonicalPath + primitiveFieldAndType +.toDF("value") +.write +.option("compression", "GzIp") +.text(path) + + new File(path).listFiles() match { +case compressedFiles => + assert(compressedFiles.exists(_.getName.endsWith(".gz"))) + } + + val jsonDF = spark.read.option("wholeFile", true).json(path) + val jsonDir = new File(dir, "json").getCanonicalPath + jsonDF.coalesce(1).write +.format("json") +.option("compression", "gZiP") +.save(jsonDir) + + new File(jsonDir).listFiles() match { +case compressedFiles => + assert(compressedFiles.exists(_.getName.endsWith(".json.gz"))) + } + + val jsonCopy = spark.read +.format("json") +.load(jsonDir) + + assert(jsonCopy.count === jsonDF.count) + val jsonCopySome = jsonCopy.selectExpr("string", "long", "boolean") + val jsonDFSome = jsonDF.selectExpr("string", "long", "boolean") + checkAnswer(jsonCopySome, jsonDFSome) +} + } + + test("SPARK-18352: Parse normal multi-line JSON files (uncompressed)") { +withTempDir { dir => + dir.delete() + val path = dir.getCanonicalPath + primitiveFieldAndType +.toDF("value") +.write +.text(path) + + val jsonDF = spark.read.option("wholeFile", true).json(path) + val jsonDir = new File(dir, "json").getCanonicalPath + jsonDF.coalesce(1).write +.format("json") +.save(jsonDir) + + val compressedFiles = new File(jsonDir).listFiles() + assert(compressedFiles.exists(_.getName.endsWith(".json"))) + + val jsonCopy = spark.read +.format("json") +.load(jsonDir) + + assert(jsonCopy.count === jsonDF.count) + val jsonCopySome = jsonCopy.selectExpr("string", "long", "boolean") + val jsonDFSome = jsonDF.selectExpr("string", "long", "boolean") + checkAnswer(jsonCopySome, jsonDFSome) +} + } + + test("SPARK-18352: Expect one JSON document per file") { +// the json parser terminates as soon as it sees a matching END_OBJECT or END_ARRAY token. +// this might not be the optimal behavior but this test verifies that only the first value +// is parsed and the rest are discarded. + +// alternatively the parser could continue parsing following objects, which may further reduce +// allocations by skipping the line reader entirely + +withTempDir { dir => + dir.delete() + val path = dir.getCanonicalPath + primitiveFieldAndType +.flatMap(Iterator.fill(3)(_) ++ Iterator("\n{invalid}")) --- End diff -- sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100097749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.json + +import java.io.InputStream + +import scala.reflect.ClassTag + +import com.fasterxml.jackson.core.{JsonFactory, JsonParser} +import com.google.common.io.ByteStreams +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.{FileStatus, Path} +import org.apache.hadoop.io.{LongWritable, Text} +import org.apache.hadoop.mapreduce.Job +import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat, TextInputFormat} + +import org.apache.spark.TaskContext +import org.apache.spark.input.{PortableDataStream, StreamInputFormat} +import org.apache.spark.rdd.{BinaryFileRDD, RDD} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.json.{CreateJacksonParser, JacksonParser, JSONOptions} +import org.apache.spark.sql.execution.datasources.{CodecStreams, HadoopFileLinesReader, PartitionedFile} +import org.apache.spark.sql.types.StructType +import org.apache.spark.unsafe.types.UTF8String +import org.apache.spark.util.Utils + +/** + * Common functions for parsing JSON files + * @tparam T A datatype containing the unparsed JSON, such as [[Text]] or [[String]] + */ +abstract class JsonDataSource[T] extends Serializable { + def isSplitable: Boolean + + /** + * Parse a [[PartitionedFile]] into 0 or more [[InternalRow]] instances + */ + def readFile( +conf: Configuration, +file: PartitionedFile, +parser: JacksonParser): Iterator[InternalRow] + + /** + * Create an [[RDD]] that handles the preliminary parsing of [[T]] records + */ + protected def createBaseRdd( +sparkSession: SparkSession, +inputPaths: Seq[FileStatus]): RDD[T] + + /** + * A generic wrapper to invoke the correct [[JsonFactory]] method to allocate a [[JsonParser]] + * for an instance of [[T]] + */ + def createParser(jsonFactory: JsonFactory, value: T): JsonParser + + final def infer( + sparkSession: SparkSession, + inputPaths: Seq[FileStatus], + parsedOptions: JSONOptions): Option[StructType] = { +if (inputPaths.nonEmpty) { + val jsonSchema = InferSchema.infer( +createBaseRdd(sparkSession, inputPaths), +parsedOptions, +createParser) + checkConstraints(jsonSchema) + Some(jsonSchema) +} else { + None +} + } + + /** Constraints to be imposed on schema to be stored. */ + private def checkConstraints(schema: StructType): Unit = { +if (schema.fieldNames.length != schema.fieldNames.distinct.length) { + val duplicateColumns = schema.fieldNames.groupBy(identity).collect { +case (x, ys) if ys.length > 1 => "\"" + x + "\"" + }.mkString(", ") + throw new AnalysisException(s"Duplicate column(s) : $duplicateColumns found, " + +s"cannot save to JSON format") +} + } +} + +object JsonDataSource { + def apply(options: JSONOptions): JsonDataSource[_] = { +if (options.wholeFile) { + WholeFileJsonDataSource +} else { + TextInputJsonDataSource +} + } + + /** + * Create a new [[RDD]] via the supplied callback if there is at least one file to process, + * otherwise an [[org.apache.spark.rdd.EmptyRDD]] will be returned. + */ + def createBaseRddConf[T : ClassTag]( --- End diff -- Habit from working with languages that don't support overloading, I'll change this --- If your project is set up for it, you
[GitHub] spark pull request #16855: [SPARK-13931] Resolve stage hanging up problem in...
GitHub user GavinGavinNo1 reopened a pull request: https://github.com/apache/spark/pull/16855 [SPARK-13931] Resolve stage hanging up problem in a particular case ## What changes were proposed in this pull request? When function 'executorLost' is invoked in class 'TaskSetManager', it's significant to judge whether variable 'isZombie' is set to true. This pull request fixes the following hang: 1.Open speculation switch in the application. 2.Run this app and suppose last task of shuffleMapStage 1 finishes. Let's get the record straight, from the eyes of DAG, this stage really finishes, and from the eyes of TaskSetManager, variable 'isZombie' is set to true, but variable runningTasksSet isn't empty because of speculation. 3.Suddenly, executor 3 is lost. TaskScheduler receiving this signal, invokes all executorLost functions of rootPool's taskSetManagers. DAG receiving this signal, removes all this executor's outputLocs. 4.TaskSetManager adds all this executor's tasks to pendingTasks and tells DAG they will be resubmitted (Attention: possibly not on time). 5.DAG starts to submit a new waitingStage, let's say shuffleMapStage 2, and going to find that shuffleMapStage 1 is its missing parent because some outputLocs are removed due to executor lost. Then DAG submits shuffleMapStage 1 again. 6.DAG still receives Task 'Resubmitted' signal from old taskSetManager, and increases the number of pendingTasks of shuffleMapStage 1 each time. However, old taskSetManager won't resolve new task to submit because its variable 'isZombie' is set to true. 7.Finally shuffleMapStage 1 never finishes in DAG together with all stages depending on it. ## How was this patch tested? It's quite difficult to construct test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/GavinGavinNo1/spark resolve-stage-blocked2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16855.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16855 commit e15b2abedb6fcaf6bac8775f15bdd246fa22902e Author: GavinGavinNo1Date: 2017-02-08T14:51:59Z Resolve stage hanging up problem in a particular case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16857: [SPARK-19517][SS] KafkaSource fails to initialize partit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16857 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16857: [SPARK-19517][SS] KafkaSource fails to initialize...
GitHub user vitillo opened a pull request: https://github.com/apache/spark/pull/16857 [SPARK-19517][SS] KafkaSource fails to initialize partition offsets ## What changes were proposed in this pull request? This patch fixes a bug in `KafkaSource` with the (de)serialization of the length of the JSON string that contains the initial partition offsets. ## How was this patch tested? I ran the test suite for spark-sql-kafka-0-10. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vitillo/spark kafka_source_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16857.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16857 commit b2523b920de2329878a37f7efc1e9dda5d969b79 Author: Roberto Agostino VitilloDate: 2017-02-08T15:07:40Z Fix (de)serialization of initial partition offsets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16736 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16736 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72591/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16736 **[Test build #72591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72591/testReport)** for PR 16736 at commit [`314f6f8`](https://github.com/apache/spark/commit/314f6f8de6990b1c3bfddea503490a1797e25117). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100089611 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +if (res != null) { + res.head +} else { + null +} } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. + * @see `DataFrameStatsFunctions.approxQuantile` for detailed description. * - * Note that rows containing any null or NaN values values will be removed before - * calculation. * @param cols the names of the numerical columns * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (>= 0). + * @param relativeError The relative target precision to achieve (greater or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities of each column * - * @note Rows containing any NaN values will be removed before calculation + * @note Rows containing any null or NaN values will be removed before calculation * * @since 2.2.0 */ def approxQuantile( cols: Array[String], probabilities: Array[Double], relativeError: Double): Array[Array[Double]] = { -StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols, - probabilities, relativeError).map(_.toArray).toArray +try { + StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols, --- End diff -- Originally there was never any na dropping in `approxQuantile` as far as I can recall. That was added in #14858. cc @srowen You could also simply change the na dropping to only drop from the cols passed as args for each version? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16856 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72590/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16856#discussion_r100089351 --- Diff: docs/programming-guide.md --- @@ -77,9 +76,9 @@ In addition, if you wish to access an HDFS cluster, you need to add a dependency Finally, you need to import some Spark classes into your program. Add the following lines: {% highlight scala %} -import org.apache.spark.api.java.JavaSparkContext -import org.apache.spark.api.java.JavaRDD -import org.apache.spark.SparkConf +import org.apache.spark.api.java.JavaSparkContext; --- End diff -- You don't want semicolons in Scala right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16856 **[Test build #72590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72590/testReport)** for PR 16856 at commit [`18d6daa`](https://github.com/apache/spark/commit/18d6daa4bc08c265a3984b676cefacc377f72b74). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16856 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16856#discussion_r100089566 --- Diff: docs/programming-guide.md --- @@ -244,13 +239,13 @@ use IPython, set the `PYSPARK_DRIVER_PYTHON` variable to `ipython` when running $ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark {% endhighlight %} -To use the Jupyter notebook (previously known as the IPython notebook), --- End diff -- Several extraneous whitespace changes but whatever --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16736: [SPARK-19265][SQL][Follow-up] Configurable `table...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16736#discussion_r100089378 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfEntrySuite.scala --- @@ -164,6 +164,18 @@ class SQLConfEntrySuite extends SparkFunSuite { assert(conf.getConf(confEntry) === Some("a")) } + test("checkValue()") { --- End diff -- ah you're quite correct! let me update this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16856 **[Test build #72590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72590/testReport)** for PR 16856 at commit [`18d6daa`](https://github.com/apache/spark/commit/18d6daa4bc08c265a3984b676cefacc377f72b74). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16736 **[Test build #72591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72591/testReport)** for PR 16736 at commit [`314f6f8`](https://github.com/apache/spark/commit/314f6f8de6990b1c3bfddea503490a1797e25117). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16736: [SPARK-19265][SQL][Follow-up] Configurable `table...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16736#discussion_r100089218 --- Diff: core/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala --- @@ -128,6 +128,25 @@ class ConfigEntrySuite extends SparkFunSuite { assert(conf.get(transformationConf) === "bar") } + test("conf entry: checkValue()") { +def createConf(default: Int): ConfigEntry[Int] = + ConfigBuilder(testKey("checkValue")) +.intConf +.checkValue(value => value >= 0, "value must be non-negative") +.createWithDefault(default) + +// this succeeds +val conf = createConf(10) + +// this fails because valueConverter() calls checkValue() +val e1 = intercept[IllegalArgumentException] { conf.valueConverter("-1") } --- End diff -- sure. thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16856 cc @sameeragarwal @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/16856 [SPARK-19516][DOC] update public doc to use SparkSession instead of SparkContext ## What changes were proposed in this pull request? After Spark 2.0, `SparkSession` becomes the new entry point of Spark applications. We should update the public documents to reflect this. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16856.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16856 commit 18d6daa4bc08c265a3984b676cefacc377f72b74 Author: Wenchen FanDate: 2017-02-08T15:18:46Z update public doc to use SparkSession instead of SparkContext --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72589/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72589/testReport)** for PR 16787 at commit [`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72589/testReport)** for PR 16787 at commit [`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16787 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX][test-hadoop2.6] Add back mo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16853 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16854 **[Test build #72588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72588/testReport)** for PR 16854 at commit [`eabb3f3`](https://github.com/apache/spark/commit/eabb3f3f83da2d74cb24bf483639c85f7466a56e). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class UnivocityParser(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16854 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72588/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16848 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16854 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16855: [SPARK-13931] Resolve stage hanging up problem in...
Github user GavinGavinNo1 closed the pull request at: https://github.com/apache/spark/pull/16855 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16854 **[Test build #72588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72588/testReport)** for PR 16854 at commit [`eabb3f3`](https://github.com/apache/spark/commit/eabb3f3f83da2d74cb24bf483639c85f7466a56e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16855: [SPARK-13931] Resolve stage hanging up problem in a part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16855 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hive Serd...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16848 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16854 Let me try to add Java one and fix comments more tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16804 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16804 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72587/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16855: [SPARK-13931] Resolve stage hanging up problem in...
GitHub user GavinGavinNo1 opened a pull request: https://github.com/apache/spark/pull/16855 [SPARK-13931] Resolve stage hanging up problem in a particular case ## What changes were proposed in this pull request? When function 'executorLost' is invoked in class 'TaskSetManager', it's significant to judge whether variable 'isZombie' is set to true. This pull request fixes the following hang: 1.Open speculation switch in the application. 2.Run this app and suppose last task of shuffleMapStage 1 finishes. Let's get the record straight, from the eyes of DAG, this stage really finishes, and from the eyes of TaskSetManager, variable 'isZombie' is set to true, but variable runningTasksSet isn't empty because of speculation. 3.Suddenly, executor 3 is lost. TaskScheduler receiving this signal, invokes all executorLost functions of rootPool's taskSetManagers. DAG receiving this signal, removes all this executor's outputLocs. 4.TaskSetManager adds all this executor's tasks to pendingTasks and tells DAG they will be resubmitted (Attention: possibly not on time). 5.DAG starts to submit a new waitingStage, let's say shuffleMapStage 2, and going to find that shuffleMapStage 1 is its missing parent because some outputLocs are removed due to executor lost. Then DAG submits shuffleMapStage 1 again. 6.DAG still receives Task 'Resubmitted' signal from old taskSetManager, and increases the number of pendingTasks of shuffleMapStage 1 each time. However, old taskSetManager won't resolve new task to submit because its variable 'isZombie' is set to true. 7.Finally shuffleMapStage 1 never finishes in DAG together with all stages depending on it. ## How was this patch tested? It's quite difficult to construct test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/GavinGavinNo1/spark resolve-stage-blocked2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16855.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16855 commit e15b2abedb6fcaf6bac8775f15bdd246fa22902e Author: GavinGavinNo1Date: 2017-02-08T14:51:59Z Resolve stage hanging up problem in a particular case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16804 **[Test build #72587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72587/testReport)** for PR 16804 at commit [`e7ca0ea`](https://github.com/apache/spark/commit/e7ca0ead843f2c9650e690fe649be18fa6389e48). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16776: [SPARK-19436][SQL] Add missing tests for approxQu...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16776#discussion_r100086037 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -63,44 +63,49 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities * - * @note NaN values will be removed from the numerical column before calculation + * @note null and NaN values will be removed from the numerical column before calculation * * @since 2.0.0 */ def approxQuantile( col: String, probabilities: Array[Double], relativeError: Double): Array[Double] = { -StatFunctions.multipleApproxQuantiles(df.select(col).na.drop(), - Seq(col), probabilities, relativeError).head.toArray +val res = approxQuantile(Array(col), probabilities, relativeError) +if (res != null) { + res.head +} else { + null +} } /** * Calculates the approximate quantiles of numerical columns of a DataFrame. - * @see [[DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile]] for - * detailed description. + * @see `DataFrameStatsFunctions.approxQuantile` for detailed description. * - * Note that rows containing any null or NaN values values will be removed before - * calculation. * @param cols the names of the numerical columns * @param probabilities a list of quantile probabilities * Each number must belong to [0, 1]. * For example 0 is the minimum, 0.5 is the median, 1 is the maximum. - * @param relativeError The relative target precision to achieve (>= 0). + * @param relativeError The relative target precision to achieve (greater or equal to 0). * If set to zero, the exact quantiles are computed, which could be very expensive. * Note that values greater than 1 are accepted but give the same result as 1. * @return the approximate quantiles at the given probabilities of each column * - * @note Rows containing any NaN values will be removed before calculation + * @note Rows containing any null or NaN values will be removed before calculation * * @since 2.2.0 */ def approxQuantile( cols: Array[String], probabilities: Array[Double], relativeError: Double): Array[Array[Double]] = { -StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols, - probabilities, relativeError).map(_.toArray).toArray +try { + StatFunctions.multipleApproxQuantiles(df.select(cols.map(col): _*).na.drop(), cols, --- End diff -- @zhengruifeng Sure. If we want to make them consistent, I am fine. How about reverting https://github.com/apache/spark/pull/12135 at first? At the same time, we can work on the new solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16804 **[Test build #72587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72587/testReport)** for PR 16804 at commit [`e7ca0ea`](https://github.com/apache/spark/commit/e7ca0ead843f2c9650e690fe649be18fa6389e48). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16854: [SPARK-15463][SQL] Add an API to load DataFrame f...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16854 [SPARK-15463][SQL] Add an API to load DataFrame from Dataset[String] ## What changes were proposed in this pull request? This PR proposes to add an API that loads `DataFrame` from `Dataset[String]`. It allows pre-processing before loading into CSV, which means allowing a lot of workarounds for many narrow cases. - Case 1 - pre-processing ```scala val df = spark.read.text("...") // Pre-processing with this. spark.read.csv(df.as[String]) ``` - Case 2 - use other input formats ```scala val rdd = spark.sparkContext.newAPIHadoopFile("/file.csv.lzo", classOf[com.hadoop.mapreduce.LzoTextInputFormat], classOf[org.apache.hadoop.io.LongWritable], classOf[org.apache.hadoop.io.Text]) spark.read.csv(rdd.toDS) ``` ## How was this patch tested? Added tests in `CSVSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-15463 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16854.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16854 commit eabb3f3f83da2d74cb24bf483639c85f7466a56e Author: hyukjinkwonDate: 2017-02-08T14:46:55Z Add an API to load DataFrame from Dataset[String] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16804: [SPARK-19459][SQL] Add Hive datatype (char/varcha...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16804#discussion_r100085484 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala --- @@ -162,6 +162,40 @@ abstract class OrcSuite extends QueryTest with TestHiveSingleton with BeforeAndA hiveClient.runSqlHive("DROP TABLE IF EXISTS orc_varchar") } } + + test("SPARK-19459: read char/varchar column written by Hive") { +val hiveClient = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client +val location = Utils.createTempDir().toURI +try { + hiveClient.runSqlHive( +""" + |CREATE EXTERNAL TABLE hive_orc( + | a STRING, + | b CHAR(10), + | c VARCHAR(10)) + |STORED AS orc""".stripMargin) + // Hive throws an exception if I assign the location in the create table statment. + hiveClient.runSqlHive( +s"ALTER TABLE hive_orc SET LOCATION '$location'") + hiveClient.runSqlHive( +"INSERT INTO TABLE hive_orc SELECT 'a', 'b', 'c' FROM (SELECT 1) t") + --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16689 @felixcheung @titicaca Just to make sure I understand, collect on timestamp was getting `c("POSIXct", "POSIXt")` even before this change ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100083481 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,125 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { val df2 = spark.read.option("PREfersdecimaL", "true").json(records) assert(df2.schema == schema) } + + test("SPARK-18352: Parse normal multi-line JSON files (compressed)") { +withTempDir { dir => + dir.delete() + val path = dir.getCanonicalPath + primitiveFieldAndType +.toDF("value") +.write +.option("compression", "GzIp") +.text(path) + + new File(path).listFiles() match { +case compressedFiles => + assert(compressedFiles.exists(_.getName.endsWith(".gz"))) + } + + val jsonDF = spark.read.option("wholeFile", true).json(path) + val jsonDir = new File(dir, "json").getCanonicalPath + jsonDF.coalesce(1).write +.format("json") +.option("compression", "gZiP") +.save(jsonDir) + + new File(jsonDir).listFiles() match { +case compressedFiles => + assert(compressedFiles.exists(_.getName.endsWith(".json.gz"))) + } + + val jsonCopy = spark.read +.format("json") +.load(jsonDir) + + assert(jsonCopy.count === jsonDF.count) + val jsonCopySome = jsonCopy.selectExpr("string", "long", "boolean") + val jsonDFSome = jsonDF.selectExpr("string", "long", "boolean") + checkAnswer(jsonCopySome, jsonDFSome) +} + } + + test("SPARK-18352: Parse normal multi-line JSON files (uncompressed)") { +withTempDir { dir => + dir.delete() + val path = dir.getCanonicalPath + primitiveFieldAndType +.toDF("value") +.write +.text(path) + + val jsonDF = spark.read.option("wholeFile", true).json(path) + val jsonDir = new File(dir, "json").getCanonicalPath + jsonDF.coalesce(1).write +.format("json") +.save(jsonDir) + + val compressedFiles = new File(jsonDir).listFiles() + assert(compressedFiles.exists(_.getName.endsWith(".json"))) + + val jsonCopy = spark.read +.format("json") +.load(jsonDir) + + assert(jsonCopy.count === jsonDF.count) + val jsonCopySome = jsonCopy.selectExpr("string", "long", "boolean") + val jsonDFSome = jsonDF.selectExpr("string", "long", "boolean") + checkAnswer(jsonCopySome, jsonDFSome) +} + } + + test("SPARK-18352: Expect one JSON document per file") { +// the json parser terminates as soon as it sees a matching END_OBJECT or END_ARRAY token. +// this might not be the optimal behavior but this test verifies that only the first value +// is parsed and the rest are discarded. + +// alternatively the parser could continue parsing following objects, which may further reduce +// allocations by skipping the line reader entirely + +withTempDir { dir => + dir.delete() + val path = dir.getCanonicalPath + primitiveFieldAndType +.flatMap(Iterator.fill(3)(_) ++ Iterator("\n{invalid}")) --- End diff -- can we write json string literal to text file? it's hard to understand what's going on here... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16831: [SPARK-19263] Fix race in SchedulerIntegrationSuite.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16831 **[Test build #3562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3562/testReport)** for PR 16831 at commit [`67fe5df`](https://github.com/apache/spark/commit/67fe5dfe9d00c628c15078d8d99c5b0de3962946). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100082372 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -1764,4 +1769,125 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { val df2 = spark.read.option("PREfersdecimaL", "true").json(records) assert(df2.schema == schema) } + + test("SPARK-18352: Parse normal multi-line JSON files (compressed)") { +withTempDir { dir => + dir.delete() --- End diff -- looks like you need `withTempPath` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100081170 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -0,0 +1,213 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.json + +import java.io.InputStream + +import scala.reflect.ClassTag + +import com.fasterxml.jackson.core.{JsonFactory, JsonParser} +import com.google.common.io.ByteStreams +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.{FileStatus, Path} +import org.apache.hadoop.io.{LongWritable, Text} +import org.apache.hadoop.mapreduce.Job +import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat, TextInputFormat} + +import org.apache.spark.TaskContext +import org.apache.spark.input.{PortableDataStream, StreamInputFormat} +import org.apache.spark.rdd.{BinaryFileRDD, RDD} +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.json.{CreateJacksonParser, JacksonParser, JSONOptions} +import org.apache.spark.sql.execution.datasources.{CodecStreams, HadoopFileLinesReader, PartitionedFile} +import org.apache.spark.sql.types.StructType +import org.apache.spark.unsafe.types.UTF8String +import org.apache.spark.util.Utils + +/** + * Common functions for parsing JSON files + * @tparam T A datatype containing the unparsed JSON, such as [[Text]] or [[String]] + */ +abstract class JsonDataSource[T] extends Serializable { + def isSplitable: Boolean + + /** + * Parse a [[PartitionedFile]] into 0 or more [[InternalRow]] instances + */ + def readFile( +conf: Configuration, +file: PartitionedFile, +parser: JacksonParser): Iterator[InternalRow] + + /** + * Create an [[RDD]] that handles the preliminary parsing of [[T]] records + */ + protected def createBaseRdd( +sparkSession: SparkSession, +inputPaths: Seq[FileStatus]): RDD[T] + + /** + * A generic wrapper to invoke the correct [[JsonFactory]] method to allocate a [[JsonParser]] + * for an instance of [[T]] + */ + def createParser(jsonFactory: JsonFactory, value: T): JsonParser + + final def infer( + sparkSession: SparkSession, + inputPaths: Seq[FileStatus], + parsedOptions: JSONOptions): Option[StructType] = { +if (inputPaths.nonEmpty) { + val jsonSchema = InferSchema.infer( +createBaseRdd(sparkSession, inputPaths), +parsedOptions, +createParser) + checkConstraints(jsonSchema) + Some(jsonSchema) +} else { + None +} + } + + /** Constraints to be imposed on schema to be stored. */ + private def checkConstraints(schema: StructType): Unit = { +if (schema.fieldNames.length != schema.fieldNames.distinct.length) { + val duplicateColumns = schema.fieldNames.groupBy(identity).collect { +case (x, ys) if ys.length > 1 => "\"" + x + "\"" + }.mkString(", ") + throw new AnalysisException(s"Duplicate column(s) : $duplicateColumns found, " + +s"cannot save to JSON format") +} + } +} + +object JsonDataSource { + def apply(options: JSONOptions): JsonDataSource[_] = { +if (options.wholeFile) { + WholeFileJsonDataSource +} else { + TextInputJsonDataSource +} + } + + /** + * Create a new [[RDD]] via the supplied callback if there is at least one file to process, + * otherwise an [[org.apache.spark.rdd.EmptyRDD]] will be returned. + */ + def createBaseRddConf[T : ClassTag]( --- End diff -- why call it `createBaseRddConf` instead of `createBaseRdd`? --- If your project is set up for it, you can reply to this email and
[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX][test-hadoop2.6] Add back mo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16853 **[Test build #3564 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3564/testReport)** for PR 16853 at commit [`c791fdb`](https://github.com/apache/spark/commit/c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16850 **[Test build #72586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72586/testReport)** for PR 16850 at commit [`5025cb7`](https://github.com/apache/spark/commit/5025cb7511a43e24cb3a181eb7b06c69b024479f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16841#discussion_r100076749 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-having.sql.out --- @@ -0,0 +1,217 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 12 + + +-- !query 0 +create temporary view t1 as select * from values + ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'), + ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'), + ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'), + ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null), + ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null), + ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'), + ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'), + ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'), + ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04') + as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +create temporary view t2 as select * from values + ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'), + ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null), + ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'), + ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'), + ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'), + ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null) + as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +create temporary view t3 as select * from values + ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'), + ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'), + ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'), + ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'), + ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'), + ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null), + ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null), + ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04') + as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i) +-- !query 2 schema
[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16841#discussion_r100077204 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out --- @@ -0,0 +1,353 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 14 + + +-- !query 0 +create temporary view t1 as select * from values + ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'), + ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'), + ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'), + ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null), + ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null), + ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'), + ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'), + ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'), + ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04') + as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +create temporary view t2 as select * from values + ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'), + ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null), + ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'), + ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'), + ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'), + ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null) + as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +create temporary view t3 as select * from values + ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'), + ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'), + ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'), + ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'), + ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'), + ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null), + ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null), + ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04') + as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i) +-- !query 2 schema
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16850 **[Test build #3563 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3563/testReport)** for PR 16850 at commit [`5025cb7`](https://github.com/apache/spark/commit/5025cb7511a43e24cb3a181eb7b06c69b024479f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16841#discussion_r100077423 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-multiple-columns.sql.out --- @@ -0,0 +1,178 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 8 + + +-- !query 0 +create temporary view t1 as select * from values + ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:00:00.000', date '2014-04-04'), + ("val1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 01:02:00.001', date '2014-06-04'), + ("val1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("val1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:02:00.001', date '2014-05-05'), + ("val1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', null), + ("val1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 01:02:00.001', null), + ("val1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-04'), + ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 01:02:00.001', date '2014-09-04'), + ("val1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("val1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 01:02:00.001', date '2014-04-04'), + ("val1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04') + as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +create temporary view t2 as select * from values + ("val2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 01:01:00.000', date '2014-04-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 01:01:00.000', date '2015-05-04'), + ("val1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 01:01:00.000', date '2016-05-04'), + ("val1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 01:01:00.000', null), + ("val2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("val1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 01:01:00.000', date '2014-06-04'), + ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:01:00.000', date '2014-07-04'), + ("val1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 01:01:00.000', date '2014-08-05'), + ("val1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:01:00.000', date '2014-09-04'), + ("val1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 01:01:00.000', date '2014-10-04'), + ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:01:00.000', null) + as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +create temporary view t3 as select * from values + ("val3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 01:02:00.000', date '2014-04-04'), + ("val3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 01:02:00.000', date '2014-06-04'), + ("val1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 01:02:00.000', date '2014-07-04'), + ("val3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 01:02:00.000', date '2014-08-04'), + ("val3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 01:02:00.000', date '2014-09-05'), + ("val1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 01:02:00.000', null), + ("val1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 01:02:00.000', null), + ("val3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 01:02:00.000', date '2014-05-04'), + ("val3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 01:02:00.000', date '2015-05-04') + as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i) +-- !query 2
[GitHub] spark issue #16850: [SPARK-19413][SS] MapGroupsWithState for arbitrary state...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/16850 jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16760 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16760 **[Test build #72585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72585/testReport)** for PR 16760 at commit [`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16760 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72585/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16760 **[Test build #72585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72585/testReport)** for PR 16760 at commit [`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16760 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72584/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72584/testReport)** for PR 16787 at commit [`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72584/testReport)** for PR 16787 at commit [`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/16787#discussion_r100070021 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -463,117 +459,6 @@ private[spark] object HiveUtils extends Logging { case (other, tpe) if primitiveTypes contains tpe => other.toString } - /** Converts the native StructField to Hive's FieldSchema. */ - private def toHiveColumn(c: StructField): FieldSchema = { -val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) { - c.metadata.getString(HiveUtils.hiveTypeString) -} else { - c.dataType.catalogString -} -new FieldSchema(c.name, typeString, c.getComment.orNull) - } - - /** Builds the native StructField from Hive's FieldSchema. */ - private def fromHiveColumn(hc: FieldSchema): StructField = { -val columnType = try { - CatalystSqlParser.parseDataType(hc.getType) -} catch { - case e: ParseException => -throw new SparkException("Cannot recognize hive type string: " + hc.getType, e) -} - -val metadata = new MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build() -val field = StructField( - name = hc.getName, - dataType = columnType, - nullable = true, - metadata = metadata) -Option(hc.getComment).map(field.withComment).getOrElse(field) - } - - // TODO: merge this with HiveClientImpl#toHiveTable - /** Converts the native table metadata representation format CatalogTable to Hive's Table. */ - def toHiveTable(catalogTable: CatalogTable): HiveTable = { --- End diff -- this method has been deleted, and use HiveClientImpl.toHiveTable which use shim to set location. In HiveClientImpl, the hive version maybe not same with the default hive(1.2.1), so it use run time shim to setDataLocation. while here deleted HiveUtils.toHiveTable just for runtime hive execution not to interact with metastore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16831: [SPARK-19263] Fix race in SchedulerIntegrationSuite.
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16831 @squito Many thanks for your help. You are so kind person : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16831: [SPARK-19263] Fix race in SchedulerIntegrationSuite.
Github user squito commented on the issue: https://github.com/apache/spark/pull/16831 @jinxing64 that way of testing is fine, but I find its much faster to use sbt. http://www.scala-sbt.org/0.13/docs/Testing.html ``` build/sbt -Pyarn -Phadoop-2.6 -Phive-thriftserver -Dhadoop.version=2.6.5 [this will put you in an sbt console] > project core > testOnly *DAGSchedulerSuite [run all tests that match the pattern -- in this case, one suite] > testOnly *spark.scheduler.* [this time we run everything in the scheduler package] >~testOnly *DAGSchedulerSuite [the '~' in front means that as we modify the code (eg. in another terminal or an IDE), sbt will re-run the tests everytime the source changes.] >~testOnly *DAGSchedulerSuite -- -z "SPARK-12345" [as above, but only run tests within that suite whose name matches the pattern] ``` The last variant is the quickest way for me run one test repeatedly as I'm developing. Because it runs everytime I save changes to disk, it often runs when my code is in some bad state and everything fails. But no big deal, it just runs again when I fix things, so I ignore the window with the running tests until I think I have things in an OK state. some more description of the arguments to scalatest itself (eg `-z` http://www.scalatest.org/user_guide/using_the_runner) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hive Serd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16848 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hive Serd...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72577/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16848: [SPARK-19279][SQL][Follow-up] Infer Schema for Hive Serd...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16848 **[Test build #72577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72577/testReport)** for PR 16848 at commit [`1146f26`](https://github.com/apache/spark/commit/1146f2676e57ac412acdea9b3ea4619194bedb4b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16853 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16853 **[Test build #72583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72583/testReport)** for PR 16853 at commit [`c791fdb`](https://github.com/apache/spark/commit/c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72583/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16853 **[Test build #72583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72583/testReport)** for PR 16853 at commit [`c791fdb`](https://github.com/apache/spark/commit/c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16837 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16837 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72578/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16837 **[Test build #72578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72578/testReport)** for PR 16837 at commit [`329886e`](https://github.com/apache/spark/commit/329886e54d3a70e2314d67b6b6060fc33cef9b8d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16810#discussion_r100065275 --- Diff: resource-managers/yarn/pom.xml --- @@ -125,34 +125,12 @@ test - - org.apache.hadoop hadoop-yarn-server-tests tests test - --- End diff -- Oops, mockito ended up being necessary, though only according to the Maven build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16853: [SPARK-19464][BUILD][HOTFIX] Add back mockito tes...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/16853 [SPARK-19464][BUILD][HOTFIX] Add back mockito test dep in YARN module, as it ends up being required in a Maven build Add back mockito test dep in YARN module, as it ends up being required in a Maven build ## How was this patch tested? PR builder again, but also a local `mvn` run using the command that the broken Jenkins job uses You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-19464.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16853.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16853 commit c791fdb8abdcda60bb3c06fe06cca7f77ea9bdc6 Author: Sean OwenDate: 2017-02-08T13:31:51Z Add back mockito test dep in YARN module, as it ends up being required in a Maven build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16787 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72580/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72580/testReport)** for PR 16787 at commit [`a3c9f5e`](https://github.com/apache/spark/commit/a3c9f5e4a754ceee2ffb71c3da49221001b1bf2c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100064532 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -298,22 +312,22 @@ class JacksonParser( // Here, we pass empty `PartialFunction` so that this case can be // handled as a failed conversion. It will throw an exception as // long as the value is not null. -parseJsonToken(parser, dataType)(PartialFunction.empty[JsonToken, Any]) +parseJsonToken[AnyRef](parser, dataType)(PartialFunction.empty[JsonToken, AnyRef]) } /** * This method skips `FIELD_NAME`s at the beginning, and handles nulls ahead before trying * to parse the JSON token using given function `f`. If the `f` failed to parse and convert the * token, call `failedConversion` to handle the token. */ - private def parseJsonToken( + private def parseJsonToken[R >: Null]( --- End diff -- what does `>: Null` mean? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100064266 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -227,66 +267,71 @@ class JacksonParser( } case TimestampType => - (parser: JsonParser) => parseJsonToken(parser, dataType) { + (parser: JsonParser) => parseJsonToken[java.lang.Long](parser, dataType) { case VALUE_STRING => + val stringValue = parser.getText // This one will lose microseconds parts. // See https://issues.apache.org/jira/browse/SPARK-10681. - Try(options.timestampFormat.parse(parser.getText).getTime * 1000L) -.getOrElse { - // If it fails to parse, then tries the way used in 2.0 and 1.x for backwards - // compatibility. - DateTimeUtils.stringToTime(parser.getText).getTime * 1000L -} + Long.box { --- End diff -- I don't think this makes the code more readable... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16837 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100063010 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -160,7 +164,17 @@ public void writeTo(OutputStream out) throws IOException { throw new ArrayIndexOutOfBoundsException(); } - out.write(bytes, (int) arrayOffset, numBytes); + return ByteBuffer.wrap(bytes, (int) arrayOffset, numBytes); +} else { + return null; --- End diff -- will it be more consistent if we return `ByteBuffer.wrap(getBytes)` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ... PART...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16373 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org