[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19130 **[Test build #81665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81665/testReport)** for PR 19130 at commit [`4bbc09d`](https://github.com/apache/spark/commit/4bbc09d68c21496d97be3e2d9f781e7ca0bbf7e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19199#discussion_r138268612 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -109,6 +109,20 @@ class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister { } } +if (requiredSchema.length == 1 && + requiredSchema.head.name == parsedOptions.columnNameOfCorruptRecord) { + throw new AnalysisException( +"Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the\n" + + "referenced columns only include the internal corrupt record column\n" + + s"(named ${parsedOptions.columnNameOfCorruptRecord} by default). For example:\n" + + "spark.read.schema(schema).json(file).filter($\"_corrupt_record\".isNotNull).count()\n" + + "and spark.read.schema(schema).json(file).select(\"_corrupt_record\").show().\n" + --- End diff -- We better use csv example here instead of json. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19130 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16422 **[Test build #81664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81664/testReport)** for PR 16422 at commit [`0d49ee9`](https://github.com/apache/spark/commit/0d49ee91508c908daef672a04768c15a9e5c5dba). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19198 **[Test build #81663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81663/testReport)** for PR 19198 at commit [`6f3859c`](https://github.com/apache/spark/commit/6f3859c38392c9d1e5b5be9883610ecb26513736). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16422 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19198 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16422 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81655/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19130 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16422 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19198 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19185 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19130 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81658/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19185 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81660/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19198 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81659/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81662/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19130 **[Test build #81658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81658/testReport)** for PR 19130 at commit [`4bbc09d`](https://github.com/apache/spark/commit/4bbc09d68c21496d97be3e2d9f781e7ca0bbf7e7). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19185 **[Test build #81660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81660/testReport)** for PR 19185 at commit [`eb8f6b4`](https://github.com/apache/spark/commit/eb8f6b431982d6f1f0118965391560f94812ab53). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19198 **[Test build #81659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81659/testReport)** for PR 19198 at commit [`6f3859c`](https://github.com/apache/spark/commit/6f3859c38392c9d1e5b5be9883610ecb26513736). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16422 **[Test build #81655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81655/testReport)** for PR 16422 at commit [`0d49ee9`](https://github.com/apache/spark/commit/0d49ee91508c908daef672a04768c15a9e5c5dba). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19118 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19118 **[Test build #81662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81662/testReport)** for PR 19118 at commit [`2c4f2ca`](https://github.com/apache/spark/commit/2c4f2ca7f92916114d090208091ba718da5621c6). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19199 **[Test build #81661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81661/testReport)** for PR 19199 at commit [`e703fc8`](https://github.com/apache/spark/commit/e703fc8f33d1fde90d790057481f1d23f466f378). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19118 **[Test build #81662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81662/testReport)** for PR 19118 at commit [`2c4f2ca`](https://github.com/apache/spark/commit/2c4f2ca7f92916114d090208091ba718da5621c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user jmchung commented on the issue: https://github.com/apache/spark/pull/19199 cc @gatorsmile, @HyukjinKwon and @viirya. Could you guys help to review this? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19199 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not han...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19199 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19199: [SPARK-21610][SQL][FOLLOWUP] Corrupt records are ...
GitHub user jmchung opened a pull request: https://github.com/apache/spark/pull/19199 [SPARK-21610][SQL][FOLLOWUP] Corrupt records are not handled properly when creating a dataframe from a file ## What changes were proposed in this pull request? When the `requiredSchema` only contains `_corrupt_record`, the derived `actualSchema` is empty and the `_corrupt_record` are all null for all rows. This PR captures above situation and raise an exception with a reasonable workaround messag so that users can know what happened and how to fix the query. ## How was this patch tested? Added unit test in `CSVSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jmchung/spark SPARK-21610-FOLLOWUP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19199 commit e703fc8f33d1fde90d790057481f1d23f466f378 Author: Jen-Ming ChungDate: 2017-09-12T06:48:33Z follow-up PR for CSV --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19118: [SPARK-21882][CORE] OutputMetrics doesn't count written ...
Github user awarrior commented on the issue: https://github.com/apache/spark/pull/19118 @jiangxb1987 well, I passed that part above but met other initialization chances before runJob. They are in the write function of SparkHadoopWriter. > // Assert the output format/key/value class is set in JobConf. config.assertConf(jobContext, rdd.conf) <= chance val committer = config.createCommitter(stageId) committer.setupJob(jobContext) <= chance // Try to write all RDD partitions as a Hadoop OutputFormat. try { val ret = sparkContext.runJob(rdd, (context: TaskContext, iter: Iterator[(K, V)]) => { executeTask( context = context, config = config, jobTrackerId = jobTrackerId, sparkStageId = context.stageId, sparkPartitionId = context.partitionId, sparkAttemptNumber = context.attemptNumber, committer = committer, iterator = iter) }) One trace list: > java.lang.Thread.State: RUNNABLE at org.apache.hadoop.fs.FileSystem.getStatistics(FileSystem.java:3270) - locked <0x126a> (a java.lang.Class) at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:202) at org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:92) at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2598) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:91) at org.apache.hadoop.mapred.FileOutputCommitter.getWrapped(FileOutputCommitter.java:65) at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131) at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:233) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:125) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:74) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19118: [SPARK-21882][CORE] OutputMetrics doesn't count w...
Github user awarrior commented on a diff in the pull request: https://github.com/apache/spark/pull/19118#discussion_r138263099 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -112,11 +112,12 @@ object SparkHadoopWriter extends Logging { jobTrackerId, sparkStageId, sparkPartitionId, sparkAttemptNumber) committer.setupTask(taskContext) -val (outputMetrics, callback) = initHadoopOutputMetrics(context) - // Initiate the writer. config.initWriter(taskContext, sparkPartitionId) var recordsWritten = 0L + +// Initialize callback function after the writer. --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19141: [SPARK-21384] [YARN] Spark + YARN fails with Loca...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/19141#discussion_r138262309 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -565,7 +565,6 @@ private[spark] class Client( distribute(jarsArchive.toURI.getPath, resType = LocalResourceType.ARCHIVE, destName = Some(LOCALIZED_LIB_DIR)) - jarsArchive.delete() --- End diff -- What if your scenario and SPARK-20741's scenario are both encountered? Looks like your approach above cannot be worked. I'm wondering if we can copy or move this __spark_libs__.zip temp file to another non-temp file and add that file to the dist cache. That non-temp file will not be deleted and can be overwritten during another launching, so we will always have only one copy. Besides, I think we have several workarounds to handle this issue like spark.yarn.jars or spark.yarn.archive, so looks like this corner case is not so necessary to fix (just my thinking, normally people will not use local FS in a real cluster). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138254795 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Serializable; + +/** + * A read task returned by a data source reader and is responsible to create the data reader. + * The relationship between `ReadTask` and `DataReader` is similar to `Iterable` and `Iterator`. + * + * Note that, the read task will be serialized and sent to executors, then the data reader will be + * created on executors and do the actual reading. + */ +public interface ReadTask extends Serializable { + /** + * The preferred locations for this read task to run faster, but Spark can't guarantee that this --- End diff -- `locations for this read task to run faster` -> `locations where this read task can run faster` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138254289 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.util.List; + +import org.apache.spark.sql.Row; +import org.apache.spark.sql.types.StructType; + +/** + * A data source reader that can mix in various query optimization interfaces and implement these + * optimizations. The actual scan logic should be delegated to `ReadTask`s that are returned by + * this data source reader. + * + * There are mainly 3 kinds of query optimizations: + * 1. push operators downward to the data source, e.g., column pruning, filter push down, etc. + * 2. propagate information upward to Spark, e.g., report statistics, report ordering, etc. + * 3. special scans like columnar scan, unsafe row scan, etc. Note that a data source reader can + * at most implement one special scan. + * + * Spark first applies all operator push down optimizations which this data source supports. Then + * Spark collects information this data source provides for further optimizations. Finally Spark + * issues the scan request and does the actual data reading. + */ +public interface DataSourceV2Reader { + + /** + * Returns the actual schema of this data source reader, which may be different from the physical + * schema of the underlying storage, as column pruning or other optimizations may happen. + */ + StructType readSchema(); + + /** + * Returns a list of read tasks, each task is responsible for outputting data for one RDD + * partition, which means the number of tasks returned here is same as the number of RDD --- End diff -- `, which means` -> `That means` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138253471 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.util.List; + +import org.apache.spark.sql.Row; +import org.apache.spark.sql.types.StructType; + +/** + * A data source reader that can mix in various query optimization interfaces and implement these + * optimizations. The actual scan logic should be delegated to `ReadTask`s that are returned by + * this data source reader. + * + * There are mainly 3 kinds of query optimizations: + * 1. push operators downward to the data source, e.g., column pruning, filter push down, etc. + * 2. propagate information upward to Spark, e.g., report statistics, report ordering, etc. + * 3. special scans like columnar scan, unsafe row scan, etc. Note that a data source reader can + * at most implement one special scan. --- End diff -- `at most implement one` -> ` implement at most one` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138252631 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java --- @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Closeable; + +/** + * A data reader returned by a read task and is responsible for outputting data for an RDD --- End diff -- Nit: `an` -> `a` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138255522 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/ColumnPruningSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.downward; + +import org.apache.spark.sql.types.StructType; + +/** + * A mix-in interface for `DataSourceV2Reader`. Users can implement this interface to only read + * required columns/nested fields during scan. + */ +public interface ColumnPruningSupport { + + /** + * Apply column pruning w.r.t. the given requiredSchema. + * + * Implementation should try its best to prune unnecessary columns/nested fields, but it's also --- End diff -- `the unnecessary ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138254894 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Serializable; + +/** + * A read task returned by a data source reader and is responsible to create the data reader. + * The relationship between `ReadTask` and `DataReader` is similar to `Iterable` and `Iterator`. + * + * Note that, the read task will be serialized and sent to executors, then the data reader will be + * created on executors and do the actual reading. + */ +public interface ReadTask extends Serializable { + /** + * The preferred locations for this read task to run faster, but Spark can't guarantee that this --- End diff -- `can't guarantee` -> `does not guarantee` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138254192 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.util.List; + +import org.apache.spark.sql.Row; +import org.apache.spark.sql.types.StructType; + +/** + * A data source reader that can mix in various query optimization interfaces and implement these + * optimizations. The actual scan logic should be delegated to `ReadTask`s that are returned by + * this data source reader. + * + * There are mainly 3 kinds of query optimizations: + * 1. push operators downward to the data source, e.g., column pruning, filter push down, etc. + * 2. propagate information upward to Spark, e.g., report statistics, report ordering, etc. + * 3. special scans like columnar scan, unsafe row scan, etc. Note that a data source reader can + * at most implement one special scan. + * + * Spark first applies all operator push down optimizations which this data source supports. Then + * Spark collects information this data source provides for further optimizations. Finally Spark + * issues the scan request and does the actual data reading. + */ +public interface DataSourceV2Reader { + + /** + * Returns the actual schema of this data source reader, which may be different from the physical + * schema of the underlying storage, as column pruning or other optimizations may happen. + */ + StructType readSchema(); + + /** + * Returns a list of read tasks, each task is responsible for outputting data for one RDD --- End diff -- `, each` -> `. Each` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138254426 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.util.List; + +import org.apache.spark.sql.Row; +import org.apache.spark.sql.types.StructType; + +/** + * A data source reader that can mix in various query optimization interfaces and implement these + * optimizations. The actual scan logic should be delegated to `ReadTask`s that are returned by + * this data source reader. + * + * There are mainly 3 kinds of query optimizations: + * 1. push operators downward to the data source, e.g., column pruning, filter push down, etc. + * 2. propagate information upward to Spark, e.g., report statistics, report ordering, etc. + * 3. special scans like columnar scan, unsafe row scan, etc. Note that a data source reader can + * at most implement one special scan. + * + * Spark first applies all operator push down optimizations which this data source supports. Then + * Spark collects information this data source provides for further optimizations. Finally Spark + * issues the scan request and does the actual data reading. + */ +public interface DataSourceV2Reader { + + /** + * Returns the actual schema of this data source reader, which may be different from the physical + * schema of the underlying storage, as column pruning or other optimizations may happen. + */ + StructType readSchema(); + + /** + * Returns a list of read tasks, each task is responsible for outputting data for one RDD + * partition, which means the number of tasks returned here is same as the number of RDD + * partitions this scan outputs. + * + * Note that, this may not be a full scan if the data source reader mixes in other optimization + * interfaces like column pruning, filter push down, etc. These optimizations are applied before --- End diff -- `push down` -> `push-down` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138255191 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/CatalystFilterPushDownSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.downward; + +import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.InterfaceStability; +import org.apache.spark.sql.catalyst.expressions.Expression; + +/** + * A mix-in interface for `DataSourceV2Reader`. Users can implement this interface to push down + * arbitrary expressions as predicates to the data source. + */ +@Experimental +@InterfaceStability.Unstable +public interface CatalystFilterPushDownSupport { + + /** + * Push down filters, returns unsupported filters. --- End diff -- `Pushes down filters, and returns unsupported filters.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138253590 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceV2Reader.java --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.util.List; + +import org.apache.spark.sql.Row; +import org.apache.spark.sql.types.StructType; + +/** + * A data source reader that can mix in various query optimization interfaces and implement these + * optimizations. The actual scan logic should be delegated to `ReadTask`s that are returned by + * this data source reader. + * + * There are mainly 3 kinds of query optimizations: + * 1. push operators downward to the data source, e.g., column pruning, filter push down, etc. + * 2. propagate information upward to Spark, e.g., report statistics, report ordering, etc. + * 3. special scans like columnar scan, unsafe row scan, etc. Note that a data source reader can + * at most implement one special scan. + * + * Spark first applies all operator push down optimizations which this data source supports. Then --- End diff -- `push down` -> `push-down` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138258962 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/StatisticsSupport.java --- @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.upward; + +/** + * A mix in interface for `DataSourceV2Reader`. Users can implement this interface to report + * statistics to Spark. + */ +public interface StatisticsSupport { + Statistics getStatistics(); --- End diff -- Will the returned stats be adjusted by the data sources based on the operator push-down? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138255810 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/FilterPushDownSupport.java --- @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.downward; + +import org.apache.spark.sql.sources.Filter; + +/** + * A mix-in interface for `DataSourceV2Reader`. Users can implement this interface to push down + * filters to the data source and reduce the size of the data to be read. + */ +public interface FilterPushDownSupport { + + /** + * Push down filters, returns unsupported filters. --- End diff -- `Pushes down filters, and returns unsupported filters.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138255388 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/ColumnPruningSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.downward; + +import org.apache.spark.sql.types.StructType; + +/** + * A mix-in interface for `DataSourceV2Reader`. Users can implement this interface to only read + * required columns/nested fields during scan. --- End diff -- -> `the required` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138255263 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/downward/CatalystFilterPushDownSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader.downward; + +import org.apache.spark.annotation.Experimental; +import org.apache.spark.annotation.InterfaceStability; +import org.apache.spark.sql.catalyst.expressions.Expression; + +/** + * A mix-in interface for `DataSourceV2Reader`. Users can implement this interface to push down + * arbitrary expressions as predicates to the data source. --- End diff -- `Note that, this is an experimental and unstable interface` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19136: [SPARK-15689][SQL] data source v2
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19136#discussion_r138254904 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.reader; + +import java.io.Serializable; + +/** + * A read task returned by a data source reader and is responsible to create the data reader. + * The relationship between `ReadTask` and `DataReader` is similar to `Iterable` and `Iterator`. + * + * Note that, the read task will be serialized and sent to executors, then the data reader will be + * created on executors and do the actual reading. + */ +public interface ReadTask extends Serializable { + /** + * The preferred locations for this read task to run faster, but Spark can't guarantee that this + * task will always run on these locations. Implementations should make sure that it can --- End diff -- `Implementations ` -> `The Implementation` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81651/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19186 **[Test build #81651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81651/testReport)** for PR 19186 at commit [`9e53579`](https://github.com/apache/spark/commit/9e53579b4e8e69761f5a6c89cc60ab179ff78ea6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19196 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81657/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19196 **[Test build #81657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81657/testReport)** for PR 19196 at commit [`12cf02a`](https://github.com/apache/spark/commit/12cf02a10ff7219f1ed405c37c2ac87c65a6c798). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19196 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19185: [Spark-21854] Added LogisticRegressionTrainingSummary fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19185 **[Test build #81660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81660/testReport)** for PR 19185 at commit [`eb8f6b4`](https://github.com/apache/spark/commit/eb8f6b431982d6f1f0118965391560f94812ab53). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19198: [MINOR][DOC] Add missing call of `update()` in examples ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19198 **[Test build #81659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81659/testReport)** for PR 19198 at commit [`6f3859c`](https://github.com/apache/spark/commit/6f3859c38392c9d1e5b5be9883610ecb26513736). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19197: [SPARK-18608][ML] Fix double caching
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19197 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81654/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19197: [SPARK-18608][ML] Fix double caching
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19197 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19197: [SPARK-18608][ML] Fix double caching
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19197 **[Test build #81654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81654/testReport)** for PR 19197 at commit [`b485614`](https://github.com/apache/spark/commit/b4856147e04c3d57f2bfc70c70e3f136f46fa873). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19186 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81650/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19186 **[Test build #81650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81650/testReport)** for PR 19186 at commit [`e112b42`](https://github.com/apache/spark/commit/e112b42a2df231f6b200bcb0cd3759cc143c8c80). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19130 **[Test build #81658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81658/testReport)** for PR 19130 at commit [`4bbc09d`](https://github.com/apache/spark/commit/4bbc09d68c21496d97be3e2d9f781e7ca0bbf7e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19198: [MINOR][DOC] Add missing call of `update()` in ex...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/19198 [MINOR][DOC] Add missing call of `update()` in examples of PeriodicGraphCheckpointer & PeriodicRDDCheckpointer ## What changes were proposed in this pull request? forgot to call `update()` with `graph1` & `rdd1` in examples for `PeriodicGraphCheckpointer` & `PeriodicRDDCheckpoin` ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark fix_doc_checkpointer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19198 commit 6f3859c38392c9d1e5b5be9883610ecb26513736 Author: Zheng RuiFengDate: 2017-09-12T05:59:25Z create pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19130: [SPARK-21917][CORE][YARN] Supporting adding http(s) reso...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19130 @tgravescs , thanks for your comments, can you review again, if it is what you expected. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9168: [SPARK-11182] HDFS Delegation Token will be expired when ...
Github user jackiehff commented on the issue: https://github.com/apache/spark/pull/9168 @marsishandsome, the patch file in jira HDFS-9276 has been introduced into hadoop-2.6.0-cdh5.7.3 which is our hadoop cluster version, but I still got token can't be found in cache error when I running spark streaming job, so my question is whether spark source code also need to be modified according to your pull request? By the way, I also used the configuration " --conf spark.hadoop.fs.hdfs.impl.disable.cache=true", but it didn't work --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19196 **[Test build #81657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81657/testReport)** for PR 19196 at commit [`12cf02a`](https://github.com/apache/spark/commit/12cf02a10ff7219f1ed405c37c2ac87c65a6c798). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org