[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12247


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-12 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-209024772
  
Thanks, merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-12 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59419309
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -129,8 +129,17 @@ trait SchemaRelationProvider {
  * Implemented by objects that can produce a streaming [[Source]] for a 
specific format or system.
  */
 trait StreamSourceProvider {
+
+  /** Returns the name and schema of the source that can be used to 
continually read data. */
+  def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType)
+
   def createSource(
   sqlContext: SQLContext,
+  metadataPath: String,
--- End diff --

This is called `metadataPath` to avoid confusing with `checkpointLocation` 
since they are not the same path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208651263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55548/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208651259
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208651028
  
**[Test build #55548 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55548/consoleFull)**
 for PR 12247 at commit 
[`4cb1608`](https://github.com/apache/spark/commit/4cb16085590de943aea9972274f7f2d114125653).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59303566
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -341,6 +347,33 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest with SharedSQLContext {
 Utils.deleteRecursively(tmp)
   }
 
+  test("metadataPath should be in checkpointLocation") {
--- End diff --

I removed this test as now `metadataPath` is for all `Source`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208616091
  
**[Test build #55548 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55548/consoleFull)**
 for PR 12247 at commit 
[`4cb1608`](https://github.com/apache/spark/commit/4cb16085590de943aea9972274f7f2d114125653).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208607923
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55539/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208607918
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208607307
  
**[Test build #55539 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55539/consoleFull)**
 for PR 12247 at commit 
[`a761692`](https://github.com/apache/spark/commit/a761692ed8eb752989fd03f6ec4a0d71a11880d8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59297483
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -129,8 +129,17 @@ trait SchemaRelationProvider {
  * Implemented by objects that can produce a streaming [[Source]] for a 
specific format or system.
  */
 trait StreamSourceProvider {
+
+  /** Returns the name and schema of the source that can be used to 
continually read data. */
+  def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType)
+
   def createSource(
   sqlContext: SQLContext,
+  sourceId: Long,
--- End diff --

Make sense. I will update it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59297156
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -129,8 +129,17 @@ trait SchemaRelationProvider {
  * Implemented by objects that can produce a streaming [[Source]] for a 
specific format or system.
  */
 trait StreamSourceProvider {
+
+  /** Returns the name and schema of the source that can be used to 
continually read data. */
+  def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType)
+
   def createSource(
   sqlContext: SQLContext,
+  sourceId: Long,
--- End diff --

I thought the goal was to have all the data in the same location.  With 
this API everyone needs to duplicate the checkpoint location resolution logic.

Note that if you want a unique identifier the path also qualifies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59296806
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -129,8 +129,17 @@ trait SchemaRelationProvider {
  * Implemented by objects that can produce a streaming [[Source]] for a 
specific format or system.
  */
 trait StreamSourceProvider {
+
+  /** Returns the name and schema of the source that can be used to 
continually read data. */
+  def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType)
+
   def createSource(
   sqlContext: SQLContext,
+  sourceId: Long,
--- End diff --

I think some Source may not need a location. Instead, it just needs an id 
to distinguish.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59295994
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -129,8 +129,17 @@ trait SchemaRelationProvider {
  * Implemented by objects that can produce a streaming [[Source]] for a 
specific format or system.
  */
 trait StreamSourceProvider {
+
+  /** Returns the name and schema of the source that can be used to 
continually read data. */
+  def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType)
+
   def createSource(
   sqlContext: SQLContext,
+  sourceId: Long,
--- End diff --

Why are we passing the `sourceId` instead of the location?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208593200
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55537/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208593199
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208592951
  
**[Test build #55537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55537/consoleFull)**
 for PR 12247 at commit 
[`7a818a9`](https://github.com/apache/spark/commit/7a818a9500b8f73abc8a3ef441093c3ae65e0cef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208591595
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208591596
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55536/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208591339
  
**[Test build #55536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55536/consoleFull)**
 for PR 12247 at commit 
[`61fe406`](https://github.com/apache/spark/commit/61fe40674dfa1a3b1dc9f586b54d5a9993a1d67e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208576061
  
**[Test build #55539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55539/consoleFull)**
 for PR 12247 at commit 
[`a761692`](https://github.com/apache/spark/commit/a761692ed8eb752989fd03f6ec4a0d71a11880d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59286840
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -341,6 +347,33 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest with SharedSQLContext {
 Utils.deleteRecursively(tmp)
   }
 
+  test("metadataPath should be in checkpointLocation") {
--- End diff --

I want to check the FileStreamSource.metadataPath value. Let me just make 
it public to avoid the reflection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59282240
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -341,6 +347,33 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest with SharedSQLContext {
 Utils.deleteRecursively(tmp)
   }
 
+  test("metadataPath should be in checkpointLocation") {
--- End diff --

What are you really testing here?  That its not just blindly ignoring the 
parameter that is passed to it?  Given the amount of reflection you are adding 
here it seems likely that the cost of maintaining this test outweighs its 
utility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59280899
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -341,6 +347,33 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest with SharedSQLContext {
 Utils.deleteRecursively(tmp)
   }
 
+  test("metadataPath should be in checkpointLocation") {
--- End diff --

`metadataPath` is only for `FileStreamSource` so I think this test belongs 
to `FileStreamSourceSuite`.

I added a test to test source ids in `DataFrameReaderWriterSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208559400
  
**[Test build #55537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55537/consoleFull)**
 for PR 12247 at commit 
[`7a818a9`](https://github.com/apache/spark/commit/7a818a9500b8f73abc8a3ef441093c3ae65e0cef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208559378
  
Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-208558013
  
**[Test build #55536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55536/consoleFull)**
 for PR 12247 at commit 
[`61fe406`](https://github.com/apache/spark/commit/61fe40674dfa1a3b1dc9f586b54d5a9993a1d67e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59256046
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -67,12 +62,33 @@ class FileStreamSource(
   }
 
   /**
+   * Set the metadata path. This method should be called before using 
[[FileStreamSource]].
+   */
+  def setMetadataPath(metadataPath: String): Unit = {
--- End diff --

Sure, but if you find yourself hacking around the fact that we don't know 
some information at some point in the control flow and its making the 
implementation a lot more complicated, then we need to rethink the control flow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59255827
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -123,8 +123,16 @@ case class DataSource(
 }
   }
 
-  /** Returns a source that can be used to continually read data. */
-  def createSource(): Source = {
+  /**
+   * Returns a source that can be used to continually read data.
+   *
+   * Before running a real query (e.g., df.explain), `sourceId` and 
`checkpointLocation` is None
+   * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` 
and `checkpointLocation`
+   * before starting a query.
+   */
+  def createSource(
+  sourceId: Option[Long] = None,
+  checkpointLocation: Option[String] = None): Source = {
--- End diff --

Yeah, and we also don't really need to create a source there (we only need 
to know the schema).  Perhaps getting the schema should be separated from 
getting the source (like we do in FileFormat).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59255253
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -67,12 +62,33 @@ class FileStreamSource(
   }
 
   /**
+   * Set the metadata path. This method should be called before using 
[[FileStreamSource]].
+   */
+  def setMetadataPath(metadataPath: String): Unit = {
--- End diff --

> I'd really prefer to avoid the pattern of having a initialization that is 
separate from the constructor.

Same as above. We don't know `metadataPath` when `DataSource.createSource` 
is called in `DataFrameReader`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59254961
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala ---
@@ -59,7 +59,7 @@ class MemorySinkSuite extends StreamTest with 
SharedSQLContext {
   }
 
   test("error if attempting to resume specific checkpoint") {
-val location = 
Utils.createTempDir("steaming.checkpoint").getCanonicalPath
+val location = Utils.createTempDir(namePrefix = 
"steaming.checkpoint").getCanonicalPath
--- End diff --

> Why this change?

Avoid to create `steaming.checkpoint` in the sql folder. I have to clean my 
repo after running this test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59254687
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -123,8 +123,16 @@ case class DataSource(
 }
   }
 
-  /** Returns a source that can be used to continually read data. */
-  def createSource(): Source = {
+  /**
+   * Returns a source that can be used to continually read data.
+   *
+   * Before running a real query (e.g., df.explain), `sourceId` and 
`checkpointLocation` is None
+   * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` 
and `checkpointLocation`
+   * before starting a query.
+   */
+  def createSource(
+  sourceId: Option[Long] = None,
+  checkpointLocation: Option[String] = None): Source = {
--- End diff --

`sourceId` and `checkpointLocation` are set via DataFrameWriter. When this 
one is called in `DataFrameReader`, we don't know them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59254285
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -341,6 +347,33 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest with SharedSQLContext {
 Utils.deleteRecursively(tmp)
   }
 
+  test("metadataPath should be in checkpointLocation") {
--- End diff --

Could we just test this in DataFrameReaderWriterSuite?  This seems kind of 
integration heavy.  It would be good to test that multiple sources get 
different ids too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59253946
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
 ---
@@ -67,12 +62,33 @@ class FileStreamSource(
   }
 
   /**
+   * Set the metadata path. This method should be called before using 
[[FileStreamSource]].
+   */
+  def setMetadataPath(metadataPath: String): Unit = {
--- End diff --

I'd really prefer to avoid the pattern of having a initialization that is 
separate from the constructor.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59253976
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala ---
@@ -59,7 +59,7 @@ class MemorySinkSuite extends StreamTest with 
SharedSQLContext {
   }
 
   test("error if attempting to resume specific checkpoint") {
-val location = 
Utils.createTempDir("steaming.checkpoint").getCanonicalPath
+val location = Utils.createTempDir(namePrefix = 
"steaming.checkpoint").getCanonicalPath
--- End diff --

Why this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12247#discussion_r59253837
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -123,8 +123,16 @@ case class DataSource(
 }
   }
 
-  /** Returns a source that can be used to continually read data. */
-  def createSource(): Source = {
+  /**
+   * Returns a source that can be used to continually read data.
+   *
+   * Before running a real query (e.g., df.explain), `sourceId` and 
`checkpointLocation` is None
+   * as they are unknown. [[ContinuousQueryManager]] should set `sourceId` 
and `checkpointLocation`
+   * before starting a query.
+   */
+  def createSource(
+  sourceId: Option[Long] = None,
+  checkpointLocation: Option[String] = None): Source = {
--- End diff --

Why are these optional?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-08 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207516450
  
cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207216362
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55308/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207216360
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207216132
  
**[Test build #55308 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55308/consoleFull)**
 for PR 12247 at commit 
[`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207196573
  
**[Test build #55308 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55308/consoleFull)**
 for PR 12247 at commit 
[`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207195658
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207152941
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207152943
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55270/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207152790
  
**[Test build #55270 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55270/consoleFull)**
 for PR 12247 at commit 
[`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12247#issuecomment-207136914
  
**[Test build #55270 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55270/consoleFull)**
 for PR 12247 at commit 
[`d161f3a`](https://github.com/apache/spark/commit/d161f3adb978dc4ed519eb3318731ac05c247f5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14474][SQL]Move FileSource offset log i...

2016-04-07 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/12247

[SPARK-14474][SQL]Move FileSource offset log into checkpointLocation

## What changes were proposed in this pull request?

Now that we have a single location for storing checkpointed state. This PR 
just propagates the checkpoint location into FileStreamSource so that we don't 
have one random log off on its own.

## How was this patch tested?

test("metadataPath should be in checkpointLocation")

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark file-source-log-location

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12247.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12247


commit d161f3adb978dc4ed519eb3318731ac05c247f5b
Author: Shixiong Zhu 
Date:   2016-04-07T22:27:12Z

Move FileSource offset log into checkpointLocation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org