[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12247 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-209024772 Thanks, merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59419309 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -129,8 +129,17 @@ trait SchemaRelationProvider { * Implemented by objects that can produce a streaming [[Source]] for a specific format or system. */ trait StreamSourceProvider { + + /** Returns the name and schema of the source that can be used to continually read data. */ + def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) + def createSource( sqlContext: SQLContext, + metadataPath: String, --- End diff -- This is called `metadataPath` to avoid confusing with `checkpointLocation` since they are not the same path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208651263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55548/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208651259 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208651028 **[Test build #55548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55548/consoleFull)** for PR 12247 at commit [`4cb1608`](https://github.com/apache/spark/commit/4cb16085590de943aea9972274f7f2d114125653). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59303566 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext { Utils.deleteRecursively(tmp) } + test("metadataPath should be in checkpointLocation") { --- End diff -- I removed this test as now `metadataPath` is for all `Source`s. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208616091 **[Test build #55548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55548/consoleFull)** for PR 12247 at commit [`4cb1608`](https://github.com/apache/spark/commit/4cb16085590de943aea9972274f7f2d114125653). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208607923 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55539/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208607918 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208607307 **[Test build #55539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55539/consoleFull)** for PR 12247 at commit [`a761692`](https://github.com/apache/spark/commit/a761692ed8eb752989fd03f6ec4a0d71a11880d8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59297483 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -129,8 +129,17 @@ trait SchemaRelationProvider { * Implemented by objects that can produce a streaming [[Source]] for a specific format or system. */ trait StreamSourceProvider { + + /** Returns the name and schema of the source that can be used to continually read data. */ + def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) + def createSource( sqlContext: SQLContext, + sourceId: Long, --- End diff -- Make sense. I will update it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59297156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -129,8 +129,17 @@ trait SchemaRelationProvider { * Implemented by objects that can produce a streaming [[Source]] for a specific format or system. */ trait StreamSourceProvider { + + /** Returns the name and schema of the source that can be used to continually read data. */ + def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) + def createSource( sqlContext: SQLContext, + sourceId: Long, --- End diff -- I thought the goal was to have all the data in the same location. With this API everyone needs to duplicate the checkpoint location resolution logic. Note that if you want a unique identifier the path also qualifies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59296806 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -129,8 +129,17 @@ trait SchemaRelationProvider { * Implemented by objects that can produce a streaming [[Source]] for a specific format or system. */ trait StreamSourceProvider { + + /** Returns the name and schema of the source that can be used to continually read data. */ + def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) + def createSource( sqlContext: SQLContext, + sourceId: Long, --- End diff -- I think some Source may not need a location. Instead, it just needs an id to distinguish. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59295994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -129,8 +129,17 @@ trait SchemaRelationProvider { * Implemented by objects that can produce a streaming [[Source]] for a specific format or system. */ trait StreamSourceProvider { + + /** Returns the name and schema of the source that can be used to continually read data. */ + def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) + def createSource( sqlContext: SQLContext, + sourceId: Long, --- End diff -- Why are we passing the `sourceId` instead of the location? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208593200 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55537/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208593199 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208592951 **[Test build #55537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55537/consoleFull)** for PR 12247 at commit [`7a818a9`](https://github.com/apache/spark/commit/7a818a9500b8f73abc8a3ef441093c3ae65e0cef). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208591595 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208591596 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55536/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208591339 **[Test build #55536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55536/consoleFull)** for PR 12247 at commit [`61fe406`](https://github.com/apache/spark/commit/61fe40674dfa1a3b1dc9f586b54d5a9993a1d67e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208576061 **[Test build #55539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55539/consoleFull)** for PR 12247 at commit [`a761692`](https://github.com/apache/spark/commit/a761692ed8eb752989fd03f6ec4a0d71a11880d8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59286840 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext { Utils.deleteRecursively(tmp) } + test("metadataPath should be in checkpointLocation") { --- End diff -- I want to check the FileStreamSource.metadataPath value. Let me just make it public to avoid the reflection. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59282240 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext { Utils.deleteRecursively(tmp) } + test("metadataPath should be in checkpointLocation") { --- End diff -- What are you really testing here? That its not just blindly ignoring the parameter that is passed to it? Given the amount of reflection you are adding here it seems likely that the cost of maintaining this test outweighs its utility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59280899 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext { Utils.deleteRecursively(tmp) } + test("metadataPath should be in checkpointLocation") { --- End diff -- `metadataPath` is only for `FileStreamSource` so I think this test belongs to `FileStreamSourceSuite`. I added a test to test source ids in `DataFrameReaderWriterSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208559400 **[Test build #55537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55537/consoleFull)** for PR 12247 at commit [`7a818a9`](https://github.com/apache/spark/commit/7a818a9500b8f73abc8a3ef441093c3ae65e0cef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208559378 Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-208558013 **[Test build #55536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55536/consoleFull)** for PR 12247 at commit [`61fe406`](https://github.com/apache/spark/commit/61fe40674dfa1a3b1dc9f586b54d5a9993a1d67e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59256046 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -67,12 +62,33 @@ class FileStreamSource( } /** + * Set the metadata path. This method should be called before using [[FileStreamSource]]. + */ + def setMetadataPath(metadataPath: String): Unit = { --- End diff -- Sure, but if you find yourself hacking around the fact that we don't know some information at some point in the control flow and its making the implementation a lot more complicated, then we need to rethink the control flow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59255827 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -123,8 +123,16 @@ case class DataSource( } } - /** Returns a source that can be used to continually read data. */ - def createSource(): Source = { + /** + * Returns a source that can be used to continually read data. + * + * Before running a real query (e.g., df.explain), `sourceId` and `checkpointLocation` is None + * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` and `checkpointLocation` + * before starting a query. + */ + def createSource( + sourceId: Option[Long] = None, + checkpointLocation: Option[String] = None): Source = { --- End diff -- Yeah, and we also don't really need to create a source there (we only need to know the schema). Perhaps getting the schema should be separated from getting the source (like we do in FileFormat). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59255253 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -67,12 +62,33 @@ class FileStreamSource( } /** + * Set the metadata path. This method should be called before using [[FileStreamSource]]. + */ + def setMetadataPath(metadataPath: String): Unit = { --- End diff -- > I'd really prefer to avoid the pattern of having a initialization that is separate from the constructor. Same as above. We don't know `metadataPath` when `DataSource.createSource` is called in `DataFrameReader`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59254961 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala --- @@ -59,7 +59,7 @@ class MemorySinkSuite extends StreamTest with SharedSQLContext { } test("error if attempting to resume specific checkpoint") { -val location = Utils.createTempDir("steaming.checkpoint").getCanonicalPath +val location = Utils.createTempDir(namePrefix = "steaming.checkpoint").getCanonicalPath --- End diff -- > Why this change? Avoid to create `steaming.checkpoint` in the sql folder. I have to clean my repo after running this test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59254687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -123,8 +123,16 @@ case class DataSource( } } - /** Returns a source that can be used to continually read data. */ - def createSource(): Source = { + /** + * Returns a source that can be used to continually read data. + * + * Before running a real query (e.g., df.explain), `sourceId` and `checkpointLocation` is None + * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` and `checkpointLocation` + * before starting a query. + */ + def createSource( + sourceId: Option[Long] = None, + checkpointLocation: Option[String] = None): Source = { --- End diff -- `sourceId` and `checkpointLocation` are set via DataFrameWriter. When this one is called in `DataFrameReader`, we don't know them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59254285 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -341,6 +347,33 @@ class FileStreamSourceSuite extends FileStreamSourceTest with SharedSQLContext { Utils.deleteRecursively(tmp) } + test("metadataPath should be in checkpointLocation") { --- End diff -- Could we just test this in DataFrameReaderWriterSuite? This seems kind of integration heavy. It would be good to test that multiple sources get different ids too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59253946 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -67,12 +62,33 @@ class FileStreamSource( } /** + * Set the metadata path. This method should be called before using [[FileStreamSource]]. + */ + def setMetadataPath(metadataPath: String): Unit = { --- End diff -- I'd really prefer to avoid the pattern of having a initialization that is separate from the constructor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59253976 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala --- @@ -59,7 +59,7 @@ class MemorySinkSuite extends StreamTest with SharedSQLContext { } test("error if attempting to resume specific checkpoint") { -val location = Utils.createTempDir("steaming.checkpoint").getCanonicalPath +val location = Utils.createTempDir(namePrefix = "steaming.checkpoint").getCanonicalPath --- End diff -- Why this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/12247#discussion_r59253837 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -123,8 +123,16 @@ case class DataSource( } } - /** Returns a source that can be used to continually read data. */ - def createSource(): Source = { + /** + * Returns a source that can be used to continually read data. + * + * Before running a real query (e.g., df.explain), `sourceId` and `checkpointLocation` is None + * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` and `checkpointLocation` + * before starting a query. + */ + def createSource( + sourceId: Option[Long] = None, + checkpointLocation: Option[String] = None): Source = { --- End diff -- Why are these optional? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207516450 cc @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207216362 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55308/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207216360 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207216132 **[Test build #55308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55308/consoleFull)** for PR 12247 at commit [`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207196573 **[Test build #55308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55308/consoleFull)** for PR 12247 at commit [`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207195658 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207152941 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207152943 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55270/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207152790 **[Test build #55270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55270/consoleFull)** for PR 12247 at commit [`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12247#issuecomment-207136914 **[Test build #55270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55270/consoleFull)** for PR 12247 at commit [`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/12247 [SPARK-14474][SQL]Move FileSource offset log into checkpointLocation ## What changes were proposed in this pull request? Now that we have a single location for storing checkpointed state. This PR just propagates the checkpoint location into FileStreamSource so that we don't have one random log off on its own. ## How was this patch tested? test("metadataPath should be in checkpointLocation") You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark file-source-log-location Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12247.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12247 commit d161f3adb978dc4ed519eb3318731ac05c247f5b Author: Shixiong ZhuDate: 2016-04-07T22:27:12Z Move FileSource offset log into checkpointLocation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org