[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-221157062
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-221157064
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59175/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-221156923
  
**[Test build #59175 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59175/consoleFull)**
 for PR 12268 at commit 
[`66b1757`](https://github.com/apache/spark/commit/66b17570a8d1ad53b5073bbfa439eb01b05413c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-221156930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59174/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-221156929
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-221156828
  
**[Test build #59174 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59174/consoleFull)**
 for PR 12268 at commit 
[`d1f616e`](https://github.com/apache/spark/commit/d1f616e2880e1100f9ffe71981a6039720d0eff4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-221147030
  
**[Test build #59174 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59174/consoleFull)**
 for PR 12268 at commit 
[`d1f616e`](https://github.com/apache/spark/commit/d1f616e2880e1100f9ffe71981a6039720d0eff4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-221147706
  
**[Test build #59175 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59175/consoleFull)**
 for PR 12268 at commit 
[`66b1757`](https://github.com/apache/spark/commit/66b17570a8d1ad53b5073bbfa439eb01b05413c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-218956629
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58538/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-218956628
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-218956508
  
**[Test build #58538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58538/consoleFull)**
 for PR 12268 at commit 
[`cbb1674`](https://github.com/apache/spark/commit/cbb1674ecb4a82bfdb3fed97cdd14adbdd14ffb6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-218948489
  
**[Test build #58538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58538/consoleFull)**
 for PR 12268 at commit 
[`cbb1674`](https://github.com/apache/spark/commit/cbb1674ecb4a82bfdb3fed97cdd14adbdd14ffb6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-217603358
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-217603319
  
**[Test build #58046 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58046/consoleFull)**
 for PR 12268 at commit 
[`f2234e3`](https://github.com/apache/spark/commit/f2234e3f7bac02c396a8638f69baab740bc83bb1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class NoSuchPermanentFunctionException(db: String, func: String)`
  * `class NoSuchFunctionException(db: String, func: String)`
  * `case class GetExternalRowField(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-217603359
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58046/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-217599175
  
**[Test build #58046 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58046/consoleFull)**
 for PR 12268 at commit 
[`f2234e3`](https://github.com/apache/spark/commit/f2234e3f7bac02c396a8638f69baab740bc83bb1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-217075071
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-217075074
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57834/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-217074901
  
**[Test build #57834 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57834/consoleFull)**
 for PR 12268 at commit 
[`a0aed27`](https://github.com/apache/spark/commit/a0aed27b7169caee50d0e97bceb6653202ba3f04).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-217066498
  
**[Test build #57834 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57834/consoleFull)**
 for PR 12268 at commit 
[`a0aed27`](https://github.com/apache/spark/commit/a0aed27b7169caee50d0e97bceb6653202ba3f04).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-216097352
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-216097353
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57498/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-216097282
  
**[Test build #57498 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57498/consoleFull)**
 for PR 12268 at commit 
[`8e1bdf7`](https://github.com/apache/spark/commit/8e1bdf7176296eb9bd10f1249dd951abd0094191).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-216094877
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57496/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-216094875
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-216094838
  
**[Test build #57496 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57496/consoleFull)**
 for PR 12268 at commit 
[`bd510c2`](https://github.com/apache/spark/commit/bd510c2b309f1da0099205838dd7856737c8ab61).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-216091185
  
**[Test build #57498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57498/consoleFull)**
 for PR 12268 at commit 
[`8e1bdf7`](https://github.com/apache/spark/commit/8e1bdf7176296eb9bd10f1249dd951abd0094191).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-05-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-216090185
  
**[Test build #57496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57496/consoleFull)**
 for PR 12268 at commit 
[`bd510c2`](https://github.com/apache/spark/commit/bd510c2b309f1da0099205838dd7856737c8ab61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-29 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-215928432
  
Since this is almost a complete rewrite, I think we should only consider it 
early in the release cycle, i.e. for 2.1, not for 2.0 when we are so close.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-215907300
  
@rxin, @hvanhovell Do you mind if I ask your thoughts on this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61366691
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
+} else if (options.permissive && schemaFields.length < 
tokens.length) {
+  tokens.take(schemaFields.length)
+} else {
+  tokens
+}
+try {
+  val row = convertTokens(
+indexSafeTokens,
+safeRequiredIndices,
+schemaFields,
+requiredSize,
+options)
+  Some(row)
+} catch {
+  case NonFatal(e) if options.dropMalformed =>
+logWarning("Parse exception. " +
+  s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+}
+  }
+}
+  }
+
+  /**
+   * Convert the tokens to [[InternalRow]]
+   */
+  private def convertTokens(
+  tokens: Array[String],
+  requiredIndices: Array[Int],
+  schemaFields: Array[StructField],
+  requiredSize: Int,
+  options: CSVOptions): InternalRow = {
+val row = new GenericMutableRow(requiredSize)
--- End diff --

Oh yes! I noticed this too. JSON data source will does as far as I 
remember. This might have 

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-215275474
  
@hvanhovell If you think it makes sense I will change the title of this PR 
and JIRA, and will add some more commits to deal with minor things (code style 
and etc.).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-215274872
  
@hvanhovell Thank you for a close look! I think I need to change this title 
of this issue and JIRA because "better performance" might be too broad.

The main purpose of this PR was,
 - Refactoring this to be consistent with JSON data source
 - Remove unnecessary conversion from `Iterator` to `Reader`.

Could I please make some JIRAs and PRs for this in separate PRs or 
follow-ups if it makes sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61359944
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
+} else if (options.permissive && schemaFields.length < 
tokens.length) {
+  tokens.take(schemaFields.length)
--- End diff --

Oh, I haven't tested this yet but I am sure this will work without this 
logic anyway but I think it is safe to slice this here.

The size of `tokens` can be larger than `schemaFields`. I can remove this 
logic if you feel strongly weird but I feel like it might be okay to just leave.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61359353
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
--- End diff --

Thanks for pointing this out. I will think about this further. Maybe I 
could do this in a separate PR if you think it is sensible. The codes were 
copied from original.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61359253
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
+} else if (options.permissive && schemaFields.length < 
tokens.length) {
+  tokens.take(schemaFields.length)
+} else {
+  tokens
+}
+try {
+  val row = convertTokens(
+indexSafeTokens,
+safeRequiredIndices,
+schemaFields,
+requiredSize,
+options)
+  Some(row)
+} catch {
+  case NonFatal(e) if options.dropMalformed =>
+logWarning("Parse exception. " +
+  s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+}
+  }
+}
+  }
+
+  /**
+   * Convert the tokens to [[InternalRow]]
+   */
+  private def convertTokens(
--- End diff --

I see. Could I do this as well in a separate PR with the purpose of this? 
Codes were just copied from the original and I just made a function to separate 
this with the consistent name with JSON data source `convertX()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your 

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61359080
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Converts a sequence of string to CSV string
+ */
+private[csv] object UnivocityGenerator extends Logging {
+  /**
+   * Transforms a single InternalRow to CSV using Univocity
+   *
+   * @param rowSchema the schema object used for conversion
+   * @param writer a CsvWriter object
+   * @param headers headers to write
+   * @param writeHeader true if it needs to write header
+   * @param options CSVOptions object containing options
+   * @param row The row to convert
+   */
+  def apply(
+  rowSchema: StructType,
+  writer: CsvWriter,
+  headers: Array[String],
+  writeHeader: Boolean,
+  options: CSVOptions)(row: InternalRow): Unit = {
+val tokens = {
+  row.toSeq(rowSchema).map { field =>
--- End diff --

Thank you! Could I maybe do this in a separate PR with the purpose of this? 
This was just copied from original codes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r6135
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -17,152 +17,162 @@
 
 package org.apache.spark.sql.execution.datasources.csv
 
-import scala.util.control.NonFatal
-
-import org.apache.hadoop.fs.Path
-import org.apache.hadoop.io.{NullWritable, Text}
-import org.apache.hadoop.mapreduce.RecordWriter
-import org.apache.hadoop.mapreduce.TaskAttemptContext
+import java.io.CharArrayWriter
+import java.nio.charset.{Charset, StandardCharsets}
+
+import com.univocity.parsers.csv.CsvWriter
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io.{LongWritable, NullWritable, Text}
+import org.apache.hadoop.mapred.TextInputFormat
+import org.apache.hadoop.mapreduce.{Job, RecordWriter, TaskAttemptContext}
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
-import org.apache.spark.sql.execution.datasources.{OutputWriter, 
OutputWriterFactory, PartitionedFile}
+import org.apache.spark.sql.catalyst.expressions.JoinedRow
+import 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
 import org.apache.spark.sql.types._
+import org.apache.spark.util.SerializableConfiguration
 
-object CSVRelation extends Logging {
-
-  def univocityTokenizer(
-  file: RDD[String],
-  header: Seq[String],
-  firstLine: String,
-  params: CSVOptions): RDD[Array[String]] = {
-// If header is set, make sure firstLine is materialized before 
sending to executors.
-file.mapPartitions { iter =>
-  new BulkCsvReader(
-if (params.headerFlag) iter.filterNot(_ == firstLine) else iter,
-params,
-headers = header)
-}
-  }
+/**
+ * Provides access to CSV data from pure SQL statements.
+ */
+class DefaultSource extends FileFormat with DataSourceRegister {
+
+  override def shortName(): String = "csv"
+
+  override def toString: String = "CSV"
+
+  override def hashCode(): Int = getClass.hashCode()
+
+  override def equals(other: Any): Boolean = 
other.isInstanceOf[DefaultSource]
 
-  def csvParser(
-  schema: StructType,
-  requiredColumns: Array[String],
-  params: CSVOptions): Array[String] => Option[InternalRow] = {
-val schemaFields = schema.fields
-val requiredFields = StructType(requiredColumns.map(schema(_))).fields
-val safeRequiredFields = if (params.dropMalformed) {
-  // If `dropMalformed` is enabled, then it needs to parse all the 
values
-  // so that we can decide which row is malformed.
-  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+  override def inferSchema(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  files: Seq[FileStatus]): Option[StructType] = {
+val csvOptions = new CSVOptions(options)
+
+// TODO: Move filtering.
+val paths = files.filterNot(_.getPath.getName startsWith 
"_").map(_.getPath.toString)
+val rdd = createBaseRdd(sparkSession, csvOptions, paths)
+val schema = if (csvOptions.inferSchemaFlag) {
+  InferSchema.infer(rdd, csvOptions)
 } else {
-  requiredFields
-}
-val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
-schemaFields.zipWithIndex.filter {
-  case (field, _) => safeRequiredFields.contains(field)
-}.foreach {
-  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
-}
-val requiredSize = requiredFields.length
-val row = new GenericMutableRow(requiredSize)
-
-(tokens: Array[String]) => {
-  if (params.dropMalformed && schemaFields.length != tokens.length) {
-logWarning(s"Dropping malformed line: 
${tokens.mkString(params.delimiter.toString)}")
-None
-  } else if (params.failFast && schemaFields.length != tokens.length) {
-throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
-  s"${tokens.mkString(params.delimiter.toString)}")
+  // By default fields are assumed to be StringType
+  val filteredRdd = 
rdd.mapPartitions(CSVUtils.filterCommentAndEmpty(_, csvOptions))
+  

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61358822
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -17,152 +17,162 @@
 
 package org.apache.spark.sql.execution.datasources.csv
 
-import scala.util.control.NonFatal
-
-import org.apache.hadoop.fs.Path
-import org.apache.hadoop.io.{NullWritable, Text}
-import org.apache.hadoop.mapreduce.RecordWriter
-import org.apache.hadoop.mapreduce.TaskAttemptContext
+import java.io.CharArrayWriter
+import java.nio.charset.{Charset, StandardCharsets}
+
+import com.univocity.parsers.csv.CsvWriter
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io.{LongWritable, NullWritable, Text}
+import org.apache.hadoop.mapred.TextInputFormat
+import org.apache.hadoop.mapreduce.{Job, RecordWriter, TaskAttemptContext}
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
-import org.apache.spark.sql.execution.datasources.{OutputWriter, 
OutputWriterFactory, PartitionedFile}
+import org.apache.spark.sql.catalyst.expressions.JoinedRow
+import 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
 import org.apache.spark.sql.types._
+import org.apache.spark.util.SerializableConfiguration
 
-object CSVRelation extends Logging {
-
-  def univocityTokenizer(
-  file: RDD[String],
-  header: Seq[String],
-  firstLine: String,
-  params: CSVOptions): RDD[Array[String]] = {
-// If header is set, make sure firstLine is materialized before 
sending to executors.
-file.mapPartitions { iter =>
-  new BulkCsvReader(
-if (params.headerFlag) iter.filterNot(_ == firstLine) else iter,
-params,
-headers = header)
-}
-  }
+/**
+ * Provides access to CSV data from pure SQL statements.
+ */
+class DefaultSource extends FileFormat with DataSourceRegister {
+
+  override def shortName(): String = "csv"
+
+  override def toString: String = "CSV"
+
+  override def hashCode(): Int = getClass.hashCode()
+
+  override def equals(other: Any): Boolean = 
other.isInstanceOf[DefaultSource]
 
-  def csvParser(
-  schema: StructType,
-  requiredColumns: Array[String],
-  params: CSVOptions): Array[String] => Option[InternalRow] = {
-val schemaFields = schema.fields
-val requiredFields = StructType(requiredColumns.map(schema(_))).fields
-val safeRequiredFields = if (params.dropMalformed) {
-  // If `dropMalformed` is enabled, then it needs to parse all the 
values
-  // so that we can decide which row is malformed.
-  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+  override def inferSchema(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  files: Seq[FileStatus]): Option[StructType] = {
+val csvOptions = new CSVOptions(options)
+
+// TODO: Move filtering.
+val paths = files.filterNot(_.getPath.getName startsWith 
"_").map(_.getPath.toString)
--- End diff --

I see, I cannot guarantee. JSON data source also skip `name.startsWith("_") 
|| name.startsWith(".")` Let me follow this first. Can I maybe do this together 
with JSON data source after figuring out in a separate PR or a follow-up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61358639
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
--- End diff --

The name was also taken after from JSON data source, `JacksonParser`. Can I 
rename them together with JSON data source if this looks problematic in a 
follow-up or another PR if this is  sensible?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61358548
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Converts a sequence of string to CSV string
+ */
+private[csv] object UnivocityGenerator extends Logging {
+  /**
+   * Transforms a single InternalRow to CSV using Univocity
+   *
+   * @param rowSchema the schema object used for conversion
+   * @param writer a CsvWriter object
+   * @param headers headers to write
+   * @param writeHeader true if it needs to write header
+   * @param options CSVOptions object containing options
+   * @param row The row to convert
+   */
+  def apply(
--- End diff --

The name was also taken after from JSON data source, `JacksonGenerator`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61358501
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Converts a sequence of string to CSV string
+ */
+private[csv] object UnivocityGenerator extends Logging {
--- End diff --

Thanks! The name was also taken after from JSON data source, 
`JacksonGenerator`. Maybe I can rename them together if this one is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61358445
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/InferSchema.scala
 ---
@@ -30,22 +30,37 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
 
-private[csv] object CSVInferSchema {
+private[csv] object InferSchema {
 
   /**
* Similar to the JSON schema inference
* 1. Infer type of each row
* 2. Merge row types to find common type
* 3. Replace any null types with string type
*/
-  def infer(
-  tokenRdd: RDD[Array[String]],
-  header: Array[String],
-  nullValue: String = ""): StructType = {
+  def infer(csv: RDD[String], options: CSVOptions): StructType = {
--- End diff --

Actually, it does call this class method in `DefaultSource.inferSchema`. I 
intentionally made the same structure with `JSONRelation`. JSON data source 
also have the class with the same name and same method in order to fix issues 
easily in the future together . (Actually, the main purpose for refactoring 
this is inconsistency of structures although they could almost identical 
structures).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61269972
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
+} else if (options.permissive && schemaFields.length < 
tokens.length) {
+  tokens.take(schemaFields.length)
+} else {
+  tokens
+}
+try {
+  val row = convertTokens(
+indexSafeTokens,
+safeRequiredIndices,
+schemaFields,
+requiredSize,
+options)
+  Some(row)
+} catch {
+  case NonFatal(e) if options.dropMalformed =>
+logWarning("Parse exception. " +
+  s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+}
+  }
+}
+  }
+
+  /**
+   * Convert the tokens to [[InternalRow]]
+   */
+  private def convertTokens(
+  tokens: Array[String],
+  requiredIndices: Array[Int],
+  schemaFields: Array[StructField],
+  requiredSize: Int,
--- End diff --

Nevermind I got it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, 

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-215109697
  
@HyukjinKwon I have taken a pass. The PR looks pretty solid. I do think we 
can make it a bit more concise in some places and I do think we can make a bit 
faster as well. Let me know what you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61271578
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
+} else if (options.permissive && schemaFields.length < 
tokens.length) {
+  tokens.take(schemaFields.length)
+} else {
+  tokens
+}
+try {
+  val row = convertTokens(
+indexSafeTokens,
+safeRequiredIndices,
+schemaFields,
+requiredSize,
+options)
+  Some(row)
+} catch {
+  case NonFatal(e) if options.dropMalformed =>
+logWarning("Parse exception. " +
+  s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+}
+  }
+}
+  }
+
+  /**
+   * Convert the tokens to [[InternalRow]]
+   */
+  private def convertTokens(
+  tokens: Array[String],
+  requiredIndices: Array[Int],
+  schemaFields: Array[StructField],
+  requiredSize: Int,
+  options: CSVOptions): InternalRow = {
+val row = new GenericMutableRow(requiredSize)
--- End diff --

I am not sure about datasources, but in a lot of places within SparkSQL we 
just return update a 

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61271158
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
+} else if (options.permissive && schemaFields.length < 
tokens.length) {
+  tokens.take(schemaFields.length)
--- End diff --

Why do we want this? `convertTokens` can't read beyond the 
`schemaFields.length` right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61267542
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
+} else if (options.permissive && schemaFields.length < 
tokens.length) {
+  tokens.take(schemaFields.length)
+} else {
+  tokens
+}
+try {
+  val row = convertTokens(
+indexSafeTokens,
+safeRequiredIndices,
+schemaFields,
+requiredSize,
+options)
+  Some(row)
+} catch {
+  case NonFatal(e) if options.dropMalformed =>
+logWarning("Parse exception. " +
+  s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+}
+  }
+}
+  }
+
+  /**
+   * Convert the tokens to [[InternalRow]]
+   */
+  private def convertTokens(
+  tokens: Array[String],
+  requiredIndices: Array[Int],
+  schemaFields: Array[StructField],
+  requiredSize: Int,
--- End diff --

Can an entry in `requiredIndices` lie outside of the `requiredSize` range? 
Why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If 

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61270381
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
--- End diff --

Do you think there is way we can do this without appending an array? Using 
an extra limit in `convertTokens` is probably quicker and causes less GC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61266412
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
+  /**
+   * Convert the input iterator to a iterator having [[InternalRow]]
+   */
+  def parseCsv(
+  iter: Iterator[String],
+  schema: StructType,
+  requiredSchema: StructType,
+  headers: Array[String],
+  shouldDropHeader: Boolean,
+  options: CSVOptions): Iterator[InternalRow] = {
+if (shouldDropHeader) {
+  CSVUtils.dropHeaderLine(iter, options)
+}
+val csv = CSVUtils.filterCommentAndEmpty(iter, options)
+
+val schemaFields = schema.fields
+val requiredFields = requiredSchema.fields
+val safeRequiredFields = if (options.dropMalformed) {
+  // If `dropMalformed` is enabled, then it needs to parse all the 
values
+  // so that we can decide which row is malformed.
+  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+} else {
+  requiredFields
+}
+val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
+schemaFields.zipWithIndex.filter {
+  case (field, _) => safeRequiredFields.contains(field)
+}.foreach {
+  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
+}
+val requiredSize = requiredFields.length
+
+tokenizeData(csv, options, headers).flatMap { tokens =>
+  if (options.dropMalformed && schemaFields.length != tokens.length) {
+logWarning(s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+  } else if (options.failFast && schemaFields.length != tokens.length) 
{
+throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
+  s"${tokens.mkString(options.delimiter.toString)}")
+  } else {
+val indexSafeTokens = if (options.permissive && 
schemaFields.length > tokens.length) {
+  tokens ++ new Array[String](schemaFields.length - tokens.length)
+} else if (options.permissive && schemaFields.length < 
tokens.length) {
+  tokens.take(schemaFields.length)
+} else {
+  tokens
+}
+try {
+  val row = convertTokens(
+indexSafeTokens,
+safeRequiredIndices,
+schemaFields,
+requiredSize,
+options)
+  Some(row)
+} catch {
+  case NonFatal(e) if options.dropMalformed =>
+logWarning("Parse exception. " +
+  s"Dropping malformed line: 
${tokens.mkString(options.delimiter.toString)}")
+None
+}
+  }
+}
+  }
+
+  /**
+   * Convert the tokens to [[InternalRow]]
+   */
+  private def convertTokens(
--- End diff --

This might be a wild idea: We might be able to use an encoder here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at 

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61265498
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Converts a sequence of string to CSV string
+ */
+private[csv] object UnivocityGenerator extends Logging {
--- End diff --

Come to think of it, why not integrate this with the `CsvOutputWriter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61265122
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Converts a sequence of string to CSV string
+ */
+private[csv] object UnivocityGenerator extends Logging {
+  /**
+   * Transforms a single InternalRow to CSV using Univocity
+   *
+   * @param rowSchema the schema object used for conversion
+   * @param writer a CsvWriter object
+   * @param headers headers to write
+   * @param writeHeader true if it needs to write header
+   * @param options CSVOptions object containing options
+   * @param row The row to convert
+   */
+  def apply(
+  rowSchema: StructType,
+  writer: CsvWriter,
+  headers: Array[String],
+  writeHeader: Boolean,
+  options: CSVOptions)(row: InternalRow): Unit = {
+val tokens = {
+  row.toSeq(rowSchema).map { field =>
--- End diff --

You are calling this alot right? So it might be better not to rely on 
functional constructs here. Also take a look at the `InternalRow.toSeq` method 
there might be some room improvement if you just pass in the `DataType`s 
directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61260986
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -17,152 +17,162 @@
 
 package org.apache.spark.sql.execution.datasources.csv
 
-import scala.util.control.NonFatal
-
-import org.apache.hadoop.fs.Path
-import org.apache.hadoop.io.{NullWritable, Text}
-import org.apache.hadoop.mapreduce.RecordWriter
-import org.apache.hadoop.mapreduce.TaskAttemptContext
+import java.io.CharArrayWriter
+import java.nio.charset.{Charset, StandardCharsets}
+
+import com.univocity.parsers.csv.CsvWriter
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io.{LongWritable, NullWritable, Text}
+import org.apache.hadoop.mapred.TextInputFormat
+import org.apache.hadoop.mapreduce.{Job, RecordWriter, TaskAttemptContext}
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
-import org.apache.spark.sql.execution.datasources.{OutputWriter, 
OutputWriterFactory, PartitionedFile}
+import org.apache.spark.sql.catalyst.expressions.JoinedRow
+import 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
 import org.apache.spark.sql.types._
+import org.apache.spark.util.SerializableConfiguration
 
-object CSVRelation extends Logging {
-
-  def univocityTokenizer(
-  file: RDD[String],
-  header: Seq[String],
-  firstLine: String,
-  params: CSVOptions): RDD[Array[String]] = {
-// If header is set, make sure firstLine is materialized before 
sending to executors.
-file.mapPartitions { iter =>
-  new BulkCsvReader(
-if (params.headerFlag) iter.filterNot(_ == firstLine) else iter,
-params,
-headers = header)
-}
-  }
+/**
+ * Provides access to CSV data from pure SQL statements.
+ */
+class DefaultSource extends FileFormat with DataSourceRegister {
+
+  override def shortName(): String = "csv"
+
+  override def toString: String = "CSV"
+
+  override def hashCode(): Int = getClass.hashCode()
+
+  override def equals(other: Any): Boolean = 
other.isInstanceOf[DefaultSource]
 
-  def csvParser(
-  schema: StructType,
-  requiredColumns: Array[String],
-  params: CSVOptions): Array[String] => Option[InternalRow] = {
-val schemaFields = schema.fields
-val requiredFields = StructType(requiredColumns.map(schema(_))).fields
-val safeRequiredFields = if (params.dropMalformed) {
-  // If `dropMalformed` is enabled, then it needs to parse all the 
values
-  // so that we can decide which row is malformed.
-  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+  override def inferSchema(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  files: Seq[FileStatus]): Option[StructType] = {
+val csvOptions = new CSVOptions(options)
+
+// TODO: Move filtering.
+val paths = files.filterNot(_.getPath.getName startsWith 
"_").map(_.getPath.toString)
+val rdd = createBaseRdd(sparkSession, csvOptions, paths)
+val schema = if (csvOptions.inferSchemaFlag) {
+  InferSchema.infer(rdd, csvOptions)
 } else {
-  requiredFields
-}
-val safeRequiredIndices = new Array[Int](safeRequiredFields.length)
-schemaFields.zipWithIndex.filter {
-  case (field, _) => safeRequiredFields.contains(field)
-}.foreach {
-  case (field, index) => 
safeRequiredIndices(safeRequiredFields.indexOf(field)) = index
-}
-val requiredSize = requiredFields.length
-val row = new GenericMutableRow(requiredSize)
-
-(tokens: Array[String]) => {
-  if (params.dropMalformed && schemaFields.length != tokens.length) {
-logWarning(s"Dropping malformed line: 
${tokens.mkString(params.delimiter.toString)}")
-None
-  } else if (params.failFast && schemaFields.length != tokens.length) {
-throw new RuntimeException(s"Malformed line in FAILFAST mode: " +
-  s"${tokens.mkString(params.delimiter.toString)}")
+  // By default fields are assumed to be StringType
+  val filteredRdd = 
rdd.mapPartitions(CSVUtils.filterCommentAndEmpty(_, csvOptions))
+  

[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61260702
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -17,152 +17,162 @@
 
 package org.apache.spark.sql.execution.datasources.csv
 
-import scala.util.control.NonFatal
-
-import org.apache.hadoop.fs.Path
-import org.apache.hadoop.io.{NullWritable, Text}
-import org.apache.hadoop.mapreduce.RecordWriter
-import org.apache.hadoop.mapreduce.TaskAttemptContext
+import java.io.CharArrayWriter
+import java.nio.charset.{Charset, StandardCharsets}
+
+import com.univocity.parsers.csv.CsvWriter
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io.{LongWritable, NullWritable, Text}
+import org.apache.hadoop.mapred.TextInputFormat
+import org.apache.hadoop.mapreduce.{Job, RecordWriter, TaskAttemptContext}
 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
-import org.apache.spark.sql.execution.datasources.{OutputWriter, 
OutputWriterFactory, PartitionedFile}
+import org.apache.spark.sql.catalyst.expressions.JoinedRow
+import 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
 import org.apache.spark.sql.types._
+import org.apache.spark.util.SerializableConfiguration
 
-object CSVRelation extends Logging {
-
-  def univocityTokenizer(
-  file: RDD[String],
-  header: Seq[String],
-  firstLine: String,
-  params: CSVOptions): RDD[Array[String]] = {
-// If header is set, make sure firstLine is materialized before 
sending to executors.
-file.mapPartitions { iter =>
-  new BulkCsvReader(
-if (params.headerFlag) iter.filterNot(_ == firstLine) else iter,
-params,
-headers = header)
-}
-  }
+/**
+ * Provides access to CSV data from pure SQL statements.
+ */
+class DefaultSource extends FileFormat with DataSourceRegister {
+
+  override def shortName(): String = "csv"
+
+  override def toString: String = "CSV"
+
+  override def hashCode(): Int = getClass.hashCode()
+
+  override def equals(other: Any): Boolean = 
other.isInstanceOf[DefaultSource]
 
-  def csvParser(
-  schema: StructType,
-  requiredColumns: Array[String],
-  params: CSVOptions): Array[String] => Option[InternalRow] = {
-val schemaFields = schema.fields
-val requiredFields = StructType(requiredColumns.map(schema(_))).fields
-val safeRequiredFields = if (params.dropMalformed) {
-  // If `dropMalformed` is enabled, then it needs to parse all the 
values
-  // so that we can decide which row is malformed.
-  requiredFields ++ schemaFields.filterNot(requiredFields.contains(_))
+  override def inferSchema(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  files: Seq[FileStatus]): Option[StructType] = {
+val csvOptions = new CSVOptions(options)
+
+// TODO: Move filtering.
+val paths = files.filterNot(_.getPath.getName startsWith 
"_").map(_.getPath.toString)
--- End diff --

code style: `_.getPath.getName.startsWith("_")`?

Is it safe to skip all files with an underscore?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61258605
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import scala.util.control.NonFatal
+
+import com.univocity.parsers.csv.{CsvParser, CsvParserSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Converts CSV string to a sequence of string
+ */
+private[csv] object UnivocityParser extends Logging {
--- End diff --

Again naming. At least add csv to the name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61258460
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Converts a sequence of string to CSV string
+ */
+private[csv] object UnivocityGenerator extends Logging {
+  /**
+   * Transforms a single InternalRow to CSV using Univocity
+   *
+   * @param rowSchema the schema object used for conversion
+   * @param writer a CsvWriter object
+   * @param headers headers to write
+   * @param writeHeader true if it needs to write header
+   * @param options CSVOptions object containing options
+   * @param row The row to convert
+   */
+  def apply(
--- End diff --

Please use a more descriptive name? `writeToCsv`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61258326
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityGenerator.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import com.univocity.parsers.csv.{CsvWriter, CsvWriterSettings}
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types.StructType
+
+/**
+ * Converts a sequence of string to CSV string
+ */
+private[csv] object UnivocityGenerator extends Logging {
--- End diff --

Are we ever going to use a different generator? Why not call it 
`CsvGenerator`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-27 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/12268#discussion_r61258077
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/InferSchema.scala
 ---
@@ -30,22 +30,37 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
 
-private[csv] object CSVInferSchema {
+private[csv] object InferSchema {
 
   /**
* Similar to the JSON schema inference
* 1. Infer type of each row
* 2. Merge row types to find common type
* 3. Replace any null types with string type
*/
-  def infer(
-  tokenRdd: RDD[Array[String]],
-  header: Array[String],
-  nullValue: String = ""): StructType = {
+  def infer(csv: RDD[String], options: CSVOptions): StructType = {
--- End diff --

This looks very similar to `DefaultSource.inferSchema` why not move the 
common functionality  into a single method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214939729
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214939730
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57058/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214939534
  
**[Test build #57058 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57058/consoleFull)**
 for PR 12268 at commit 
[`ee71064`](https://github.com/apache/spark/commit/ee7106416ef17e5168a91bab044c6f6db9dbd53b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MultivariateGaussian(`
  * `class DecisionTreeClassifier @Since(\"1.4.0\") (`
  * `class GBTClassifier @Since(\"1.4.0\") (`
  * `class RandomForestClassifier @Since(\"1.4.0\") (`
  * `  class AFTSurvivalRegressionWrapperWriter(instance: 
AFTSurvivalRegressionWrapper)`
  * `  class AFTSurvivalRegressionWrapperReader extends 
MLReader[AFTSurvivalRegressionWrapper] `
  * `class DecisionTreeRegressor @Since(\"1.4.0\") (@Since(\"1.4.0\") 
override val uid: String)`
  * `class GBTRegressor @Since(\"1.4.0\") (@Since(\"1.4.0\") override val 
uid: String)`
  * `class RandomForestRegressor @Since(\"1.4.0\") (@Since(\"1.4.0\") 
override val uid: String)`
  * `case class CartesianProductExec(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214926444
  
**[Test build #57058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57058/consoleFull)**
 for PR 12268 at commit 
[`ee71064`](https://github.com/apache/spark/commit/ee7106416ef17e5168a91bab044c6f6db9dbd53b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214925695
  
@rxin No problem. Let me just rebase it if it has conflicts anyway. It is 
easier to track the changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214638048
  
cc @hvanhovell would you have some time to take a look at this?

@HyukjinKwon most of us are very busy trying to get things out for 2.0 so 
this one will very likely slip.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214625564
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56965/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214625562
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214625432
  
**[Test build #56965 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56965/consoleFull)**
 for PR 12268 at commit 
[`fe63ba2`](https://github.com/apache/spark/commit/fe63ba22d70c1427657b4967e769270d1956be38).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214624336
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214624337
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56955/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214624209
  
**[Test build #56955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56955/consoleFull)**
 for PR 12268 at commit 
[`ad21b8e`](https://github.com/apache/spark/commit/ad21b8eea981f61cb35de646f3568b27dd2141a3).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214615366
  
**[Test build #56965 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56965/consoleFull)**
 for PR 12268 at commit 
[`fe63ba2`](https://github.com/apache/spark/commit/fe63ba22d70c1427657b4967e769270d1956be38).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214614924
  
Fixed in 
https://github.com/apache/spark/commit/f8709218115f6c7aa4fb321865cdef8ceb443bd1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214614770
  
@rxin It looks this is still failing, 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56962
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56963



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214613784
  
**[Test build #56963 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56963/consoleFull)**
 for PR 12268 at commit 
[`f62755e`](https://github.com/apache/spark/commit/f62755e0875ae8f2947abf8a62505dd77b2ed9f5).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214613795
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56963/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214613793
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214611901
  
**[Test build #56963 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56963/consoleFull)**
 for PR 12268 at commit 
[`f62755e`](https://github.com/apache/spark/commit/f62755e0875ae8f2947abf8a62505dd77b2ed9f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214609981
  
This was due to 
https://github.com/apache/spark/commit/d2614eaadb93a48fba27fe7de64aff942e345f8e


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214609252
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56961/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214609239
  
**[Test build #56961 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56961/consoleFull)**
 for PR 12268 at commit 
[`d59c7e9`](https://github.com/apache/spark/commit/d59c7e98f306fa9ff5dfe3b4caae14a2de746315).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `sealed abstract class LDAModel protected[ml] (`
  * `class LocalLDAModel protected[ml] (`
  * `class DistributedLDAModel protected[ml] (`
  * `class ContinuousQueryManager(sparkSession: SparkSession) `
  * `class DataFrameReader protected[sql](sparkSession: SparkSession) 
extends Logging `
  * `class Dataset[T] protected[sql](`
  * `class QueryExecution(val sparkSession: SparkSession, val logical: 
LogicalPlan) `
  * `class FileStreamSinkLog(sparkSession: SparkSession, path: String)`
  * `class HDFSMetadataLog[T: ClassTag](sparkSession: SparkSession, path: 
String)`
  * `class StreamFileCatalog(sparkSession: SparkSession, path: Path) 
extends FileCatalog with Logging `
  * `case class PlanSubqueries(sparkSession: SparkSession) extends 
Rule[SparkPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214609249
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214608734
  
**[Test build #56961 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56961/consoleFull)**
 for PR 12268 at commit 
[`d59c7e9`](https://github.com/apache/spark/commit/d59c7e98f306fa9ff5dfe3b4caae14a2de746315).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214608678
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214608176
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56959/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214608175
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214608169
  
**[Test build #56959 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56959/consoleFull)**
 for PR 12268 at commit 
[`d59c7e9`](https://github.com/apache/spark/commit/d59c7e98f306fa9ff5dfe3b4caae14a2de746315).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `sealed abstract class LDAModel protected[ml] (`
  * `class LocalLDAModel protected[ml] (`
  * `class DistributedLDAModel protected[ml] (`
  * `class ContinuousQueryManager(sparkSession: SparkSession) `
  * `class DataFrameReader protected[sql](sparkSession: SparkSession) 
extends Logging `
  * `class Dataset[T] protected[sql](`
  * `class QueryExecution(val sparkSession: SparkSession, val logical: 
LogicalPlan) `
  * `class FileStreamSinkLog(sparkSession: SparkSession, path: String)`
  * `class HDFSMetadataLog[T: ClassTag](sparkSession: SparkSession, path: 
String)`
  * `class StreamFileCatalog(sparkSession: SparkSession, path: Path) 
extends FileCatalog with Logging `
  * `case class PlanSubqueries(sparkSession: SparkSession) extends 
Rule[SparkPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214607369
  
**[Test build #56959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56959/consoleFull)**
 for PR 12268 at commit 
[`d59c7e9`](https://github.com/apache/spark/commit/d59c7e98f306fa9ff5dfe3b4caae14a2de746315).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214605811
  
**[Test build #56955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56955/consoleFull)**
 for PR 12268 at commit 
[`ad21b8e`](https://github.com/apache/spark/commit/ad21b8eea981f61cb35de646f3568b27dd2141a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-214554406
  
ping @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213663159
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213663160
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56768/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213663125
  
**[Test build #56768 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56768/consoleFull)**
 for PR 12268 at commit 
[`92f8f38`](https://github.com/apache/spark/commit/92f8f387cec10cb61e178b312748f86bd75b1b55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213658568
  
**[Test build #56768 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56768/consoleFull)**
 for PR 12268 at commit 
[`92f8f38`](https://github.com/apache/spark/commit/92f8f387cec10cb61e178b312748f86bd75b1b55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-213658002
  
@rxin Could you please review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-212227632
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-212227633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56309/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-212227517
  
**[Test build #56309 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56309/consoleFull)**
 for PR 12268 at commit 
[`d9ea3cb`](https://github.com/apache/spark/commit/d9ea3cb5ccb8db5d8ff9e36fa1e8d4df45ea4fb2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-212206817
  
**[Test build #56309 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56309/consoleFull)**
 for PR 12268 at commit 
[`d9ea3cb`](https://github.com/apache/spark/commit/d9ea3cb5ccb8db5d8ff9e36fa1e8d4df45ea4fb2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-210971456
  
will try to take a look in the next few days.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14480][SQL] Simplify CSV parsing proces...

2016-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12268#issuecomment-210971265
  
Please excuse my pings, @cloud-fan , @rxin , @falaki , @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >