[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/14038 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r118045500 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -31,6 +31,34 @@ import org.apache.spark.sql.types.StructType /** + * A filter class to list up qualified paths in parallel. + */ +abstract class PathFilter extends Serializable { + final def accept(path: Path): Boolean = isDataPath(path) || isMetaDataPath(path) + def isDataPath(path: Path): Boolean = false + def isMetaDataPath(path: Path): Boolean = false +} + +object PathFilter { + + /** A default path filter to pass through all input paths. */ + val defaultPathFilter = new PathFilter { + +override def isDataPath(path: Path): Boolean = { + // We filter follow paths: + // 1. everything that starts with _ and ., except _common_metadata and _metadata + // because Parquet needs to find those metadata files from leaf files returned by this method. + // We should refactor this logic to not mix metadata files with data files. + // 2. everything that ends with `._COPYING_`, because this is a intermediate state of file. we + // should skip this file in case of double reading. + val name = path.getName + !((name.startsWith("_") && !name.contains("=")) || name.startsWith(".") || +name.endsWith("._COPYING_")) --- End diff -- Like @rxin said, this sounds risky to me too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r89839965 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala --- @@ -441,6 +441,44 @@ class FileSourceStrategySuite extends QueryTest with SharedSQLContext with Predi } } + test("filter out invalid files in a driver") { +withSQLConf( + "fs.file.impl" -> classOf[MockDistributedFileSystem].getName, + SQLConf.PARALLEL_PARTITION_DISCOVERY_THRESHOLD.key -> "3") { + val table = +createTable( + files = Seq( +"p1=1/file1" -> 1, +"p1=1/file2" -> 1, +"p1=2/file3" -> 1, +"p1=2/invalid_file" -> 1)) --- End diff -- I'd consider adding the full set of invalid files: ``` p1=2/file=3 -> 1 p1=2/.temp -> 1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69713606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -156,3 +162,10 @@ class ListingFileCatalog( override def hashCode(): Int = paths.toSet.hashCode() } + +object ListingFileCatalog { + /** A default path filter to pass through all input paths. */ + val passThroughPathFilter = new SerializablePathFilter { +override def accept(path: Path): Boolean = true + } --- End diff -- The latest fixes satisfies your intention? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69713543 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -30,6 +29,13 @@ import org.apache.spark.sql.types.StructType /** + * A filter class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends Serializable { --- End diff -- yea, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706488 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -230,6 +229,15 @@ trait FileFormat { } /** + * Return a `SerializablePathFilter` class to filter qualified files for this format. + */ + def getPathFilter(options: Map[String, String]): SerializablePathFilter = { +new SerializablePathFilter { + override def accept(path: Path): Boolean = true +} --- End diff -- We can replace this with `passThroughPathFilter` after moving it into this file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706413 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -156,3 +162,10 @@ class ListingFileCatalog( override def hashCode(): Int = paths.toSet.hashCode() } + +object ListingFileCatalog { + /** A default path filter to pass through all input paths. */ + val passThroughPathFilter = new SerializablePathFilter { +override def accept(path: Path): Boolean = true + } --- End diff -- Let's move into `fileSourceInterface.scala`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706379 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -30,6 +29,13 @@ import org.apache.spark.sql.types.StructType /** + * A filter class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends Serializable { --- End diff -- Also, it probably makes more sense to move this class into `fileSourceInterfaces.scala` since it's part of the public interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706249 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -30,6 +29,13 @@ import org.apache.spark.sql.types.StructType /** + * A filter class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends Serializable { --- End diff -- okay --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69706044 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -30,6 +29,13 @@ import org.apache.spark.sql.types.StructType /** + * A filter class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends Serializable { --- End diff -- Maybe just `PathFilter`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69521137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -230,6 +236,15 @@ trait FileFormat { } /** + * Return a `SerializablePathFilter` class to filter qualified files for this format. + */ + def getPathFilter(): SerializablePathFilter = { --- End diff -- yea, my bad. I'll re-check the whole code to remove `Options`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69520870 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -172,6 +171,13 @@ case class HadoopFsRelation( } /** + * A helper class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends PathFilter with Serializable { --- End diff -- okay, I'll remove the dependency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69520532 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -172,6 +171,13 @@ case class HadoopFsRelation( } /** + * A helper class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends PathFilter with Serializable { --- End diff -- yea, we need `Serializable` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69520473 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -437,11 +442,26 @@ private[sql] object HadoopFsRelation extends Logging { accessTime: Long, blockLocations: Array[FakeBlockLocation]) + private[sql] def mergePathFilter( + filter1: Option[PathFilter], filter2: Option[PathFilter]): Path => Boolean = { +(filter1, filter2) match { + case (Some(f1), Some(f2)) => +(path: Path) => f1.accept(path) && f2.accept(path) + case (Some(f1), None) => +(path: Path) => f1.accept(path) + case (None, Some(f2)) => +(path: Path) => f2.accept(path) + case (None, None) => +(path: Path) => true +} --- End diff -- This can be conciser: ```scala (filter1 ++ filter2).reduceOption { (f1, f2) => (path: Path) => f1.accept(path) && f2.accept(path) }.getOrElse { (path: Path) => true } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69520156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -172,6 +171,13 @@ case class HadoopFsRelation( } /** + * A helper class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends PathFilter with Serializable { --- End diff -- Oh I see, because parallel file listing may filter input files on executor side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69519931 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -172,6 +171,13 @@ case class HadoopFsRelation( } /** + * A helper class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends PathFilter with Serializable { --- End diff -- I probably missed something here, but why it has to be `Serializable`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69519846 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -230,6 +236,15 @@ trait FileFormat { } /** + * Return a `SerializablePathFilter` class to filter qualified files for this format. + */ + def getPathFilter(): SerializablePathFilter = { --- End diff -- What is the semantics of the return value of the method? Seems that it should never return a null filter since it defaults to an "accept all" filter. If this is true, it's unnecessary to use `Option` to wrap returned filters elsewhere in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69519641 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -172,6 +171,13 @@ case class HadoopFsRelation( } /** + * A helper class to list up qualified files in parallel. + */ +private[spark] abstract class SerializablePathFilter extends PathFilter with Serializable { --- End diff -- Extending from `PathFilter` makes internal implementation a little bit easier, but I'd prefer to avoid depending on Hadoop classes/interfaces in Spark SQL public interfaces whenever possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69519099 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -230,6 +236,15 @@ trait FileFormat { } /** + * Return a `SerializablePathFilter` class to filter qualified files for this format. + */ + def getPathFilter(): SerializablePathFilter = { --- End diff -- okay, I'll fix now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14038#discussion_r69518421 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala --- @@ -230,6 +236,15 @@ trait FileFormat { } /** + * Return a `SerializablePathFilter` class to filter qualified files for this format. + */ + def getPathFilter(): SerializablePathFilter = { --- End diff -- Shall we add either the data source options map or the Hadoop conf as an argument of this method? For example, the Avro data source may filter out all input files whose file names don't end with ".avro" if Hadoop conf "avro.mapred.ignore.inputs.without.extension" is set to true. This is consistent with default behavior of `AvroInputFormat`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
GitHub user maropu reopened a pull request: https://github.com/apache/spark/pull/14038 [SPARK-16317][SQL] Add a new interface to filter files in FileFormat ## What changes were proposed in this pull request? This pr is to add an interface for filtering files in `FileFormat` not to pass invalid files into `FileFormat#buildReader`. ## How was this patch tested? Added tests to filter files in a driver and in parallel. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-16317 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14038.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14038 commit 67703098f96da37fbe23e0f2d76017698671d5e2 Author: Takeshi YAMAMURODate: 2016-07-04T02:13:34Z Add a new interface to filter files in FileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/14038 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/14038 [SPARK-16317][SQL] Add a new interface to filter files in FileFormat ## What changes were proposed in this pull request? This pr is to add an interface for filtering files in `FileFormat` not to pass invalid files into `FileFormat#buildReader`. ## How was this patch tested? Added tests to filter files in a driver and in parallel. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-16317 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14038.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14038 commit 67703098f96da37fbe23e0f2d76017698671d5e2 Author: Takeshi YAMAMURODate: 2016-07-04T02:13:34Z Add a new interface to filter files in FileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org