[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62552/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62552 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62552/consoleFull)**
 for PR 14207 at commit 
[`a043ca2`](https://github.com/apache/spark/commit/a043ca28fc06082bc8b4104d9b38f2fbf1aa337a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62552 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62552/consoleFull)**
 for PR 14207 at commit 
[`a043ca2`](https://github.com/apache/spark/commit/a043ca28fc06082bc8b4104d9b38f2fbf1aa337a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62513/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62513 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62513/consoleFull)**
 for PR 14207 at commit 
[`55c2c5e`](https://github.com/apache/spark/commit/55c2c5e2623478a79971af3b0513727b03c1ee87).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62512/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62512 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62512/consoleFull)**
 for PR 14207 at commit 
[`c6afbbb`](https://github.com/apache/spark/commit/c6afbbb9941113d6a78bfd3aaa627653ba0f6151).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62513 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62513/consoleFull)**
 for PR 14207 at commit 
[`55c2c5e`](https://github.com/apache/spark/commit/55c2c5e2623478a79971af3b0513727b03c1ee87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62512 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62512/consoleFull)**
 for PR 14207 at commit 
[`c6afbbb`](https://github.com/apache/spark/commit/c6afbbb9941113d6a78bfd3aaa627653ba0f6151).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14207
  
> when the data/files are changed by external system (e.g., appended by a 
streaming system), the stored schema can be inconsistent with the actual schema 
of the data.

I think this problem already exists, as we will use cached schema instead 
of inferring it everytime. The only difference is after reboot, this PR will 
still use the stored schema, and require users to refresh table manually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14207
  
@gatorsmile Yea. I meant that as you use the stored schema without inferred 
schema for table, when the data/files are changed by external system (e.g., 
appended by a streaming system), the stored schema can be inconsistent with the 
actual schema of the data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14207
  
@viirya Schema inference is time-consuming, especially when the number of 
files is huge. Thus, we should avoid refreshing it every time. That is one of 
the major reasons why we have a metadata cache for data source tables. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14207
  
@gatorsmile When the data/files are input by an external system, and Spark 
is just used to process them in batch. Does it mean that schema can be 
inconsistent? Or it should call refresh every time it is going to query the 
table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14207
  
The table location is not allowed to change. Right? 

With the changes of this PR, if the changes on the data/files (pointed by 
the table location) affect the table schema, they need to manually call the 
`REFRESH` command. Restarting Spark will not cause the schema changes.

Before this PR, if users restart Spark or the corresponding cache item is 
replaced, the table schema could be changed without notice. This could be a 
potential issue when the read and write are conducted in parallel. This 
undocumented behavior could complicate the Spark applications. 

The unexpected changes should be avoided. If the schema is changed and the 
table fetching is ready for new schema, users should manually issue `REFRESH` 
command.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14207
  
Does it mean that if users do not issue refresh when the table location is 
changed, the schema will be wrong when the Spark is re-starting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14207
  
@viirya The problem it tries to resolve is from the comment of @rxin in 
another PR: https://github.com/apache/spark/pull/14148#issuecomment-232273833


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14207
  
I think it is not clear what the problem this PR tries to solve is. It just 
says it proposes to save the inferred schema in external catalog.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14207
  
@rxin @cloud-fan @yhuai 

This PR introduces a new concept `SchemaType` for determining the original 
source of a schema. When `SchemaType` is `USER`, it means this table belongs to 
`Group A`. When the type is `INFERRED`, the table requires schema inference. 
That is, `Group B`.

Not sure whether this solution sounds OK to you. Let me know whether this 
is a right direction to resolve the issue. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62344/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62344 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62344/consoleFull)**
 for PR 14207 at commit 
[`3be0dc0`](https://github.com/apache/spark/commit/3be0dc0b7cfd942459c598c0d35f3d67a2c020ba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14207: [SPARK-16552] [SQL] [WIP] Store the Inferred Schemas int...

2016-07-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62344 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62344/consoleFull)**
 for PR 14207 at commit 
[`3be0dc0`](https://github.com/apache/spark/commit/3be0dc0b7cfd942459c598c0d35f3d67a2c020ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org