[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-05-19 Thread JustinPihony
Github user JustinPihony commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-220431230
  
I just updated the branch to have no conflicts. Again, either the code 
looks good to merge, or I can make JDBC a `CreatableRelationProvider` (but that 
comes with additional baggage as already discussed...might be better in a 
separate change if at all?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-05-05 Thread JustinPihony
Github user JustinPihony commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-217359217
  
I can finish this next Monday(fix the conflicts that now exist), and will 
actually do that given the above comments. I'd still like to get an opinion on 
whether I should change the code to be a `CreatableRelationProvider` or not? I 
like the idea, but believe that there is much more room for error with that 
aspect. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-05-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-216445374
  
@rxin I also realised Python API is supporting properties as a dict having 
" arbitrary string tag/value", 
[here](https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L847-L849).
 Could we go ahead with this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-27 Thread maropu
Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-214992526
  
Even if so, I think we need a kinda wrapper or something to safely convert 
`Properties` to `Map` because newbie users could easily & 
wrongly put non-string values in `Properties`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread JustinPihony
Github user JustinPihony commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213662908
  
@HyukjinKwon I just posted on the JIRA the background of `Properties` and 
how reasonable it is to assume it can be converted to a `String`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213568166
  
A possible problem was noticed (is `Properties` guaranteed to be converted 
to `String`?) in the JIRA before this PR and any evidence was not prodivded or 
said before opening a PR, which I did instead in this PR.

Also, I think it is a minor but still it looks not sensible that 
`Additional details` does not look related with the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread JustinPihony
Github user JustinPihony commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r60752881
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -244,7 +244,11 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
   bucketSpec = getBucketSpec,
   options = extraOptions.toMap)
 
-dataSource.write(mode, df)
+dataSource.providingClass.newInstance() match {
+  case jdbc: execution.datasources.jdbc.DefaultSource =>
--- End diff --

I agree and admit I was being lazy in not trying to figure out how to make 
the current implementation return a `BaseRelation`. I'll take another look 
today at just turning the DefaultSource into a `CreatableRelationProvider`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread JustinPihony
Github user JustinPihony commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213463779
  
@HyukjinKwon You will notice that I opted to not deprecate jdbc as I don't 
think that would be the correct path anyway (unless all format methods were to 
be deprecated). I'm not sure what gave the impression that I wanted to 
deprecate, but I think multiple methods to accomplish the same goal is 
perfectly fine.  This change merely alters the underlying implementation so 
that both methods work, `write.format("jdbc")` and `write.jdbc`. So, I think 
that this realistically addresses all of your other comments surrounding this 
concern of deprecation.

As to additional details, I was simply [following the contributer 
directions.](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest)
 I only put it under additional details because that was what it was to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213397222
  
Question: I found a 
[PPT](http://www.slideshare.net/SparkSummit/structuring-spark-dataframes-datasets-and-streaming-by-michael-armbrust)
 which I think is used in Spark Summit by @marmbrus. In 30 page, 
`write.format("jdbc")` is used as a example. Is there any way.to support this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213382156
  
I think I can rework based on this because it is anyway opened already. 
Excuse my ping @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213382065
  
BTW, it looks it is pretty general that `Properties` just works like 
`HashMap[String, String]` in most cases.

Firstly, I just checked [java.sql.Driver 
API](https://docs.oracle.com/javase/7/docs/api/java/sql/Driver.html) and it 
describes the argument for `Properties` as below:

>info - a list of arbitrary string tag/value pairs as connection arguments. 
Normally at least a "user" and "password" property should be included.

Secondly, apparently Spark uses the methods below in `Properties`. 

```java
public Set stringPropertyNames()
public String getProperty(String key, String defaultValue)
public void store(OutputStream out, String comments)  // This uses convert 
for keys and values internally.
public synchronized Object setProperty(String key, String value)
```

It looks they uses `String` for keys and values.


So, I think it might be OK to support `write.format("jdbc")`. I believe 
`read.format("jdbc")` is already being supported and I could not find JIRA 
issues about the problem for giving some options for `read.format("jdbc")`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r60721164
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -244,7 +244,11 @@ final class DataFrameWriter private[sql](df: 
DataFrame) {
   bucketSpec = getBucketSpec,
   options = extraOptions.toMap)
 
-dataSource.write(mode, df)
+dataSource.providingClass.newInstance() match {
+  case jdbc: execution.datasources.jdbc.DefaultSource =>
--- End diff --

It looks a new method is introduced. I think we don't necessarily introduce 
this new function but use the existing interfaces `CreatableRelationProvider` 
in `interfaces`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213298866
  
I think `Additional details` could be said in comments not in the PR 
description because PR description describes what the PR is. Maybe `Additional 
details` is not related with the PR itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213297906
  
@JustinPihony I think we haven't reached the conclusion yet and haven't got 
any feedback, from committers, if we should deprecate `read.jdbc()` or support 
`write.format(jdbc)` in 
[SPARK-14525](https://issues.apache.org/jira/browse/SPARK-14525).

Usually, we should discuss the proper ways to fix and problems first in 
JIRA and open a PR. 

As I said in the JIRA, if we go for deprecating `read.jdbc()`, we might 
need to close this JIRA and create another one. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-21 Thread JustinPihony
GitHub user JustinPihony opened a pull request:

https://github.com/apache/spark/pull/12601

[SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc

## What changes were proposed in this pull request?

This change modifies the implementation of DataFrameWriter.save such that 
it works with jdbc, and the call to jdbc merely delegates to save. 

## How was this patch tested?

This was tested via unit tests in the JDBCWriteSuite, of which I added one 
new test to cover this scenario.

## Additional details

@rxin This seems to have been most recently touched by you and was also 
commented on in the JIRA. 

This contribution is my original work and I license the work to the project 
under the project's open source license.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JustinPihony/spark jdbc_reconciliation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12601


commit 6be5046119f1f9cffb78a08a9dbe614ed4f9d460
Author: Justin Pihony 
Date:   2016-04-22T05:55:34Z

[SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

2016-04-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12601#issuecomment-213274569
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org