[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user JustinPihony commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-220431230 I just updated the branch to have no conflicts. Again, either the code looks good to merge, or I can make JDBC a `CreatableRelationProvider` (but that comes with additional baggage as already discussed...might be better in a separate change if at all?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user JustinPihony commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-217359217 I can finish this next Monday(fix the conflicts that now exist), and will actually do that given the above comments. I'd still like to get an opinion on whether I should change the code to be a `CreatableRelationProvider` or not? I like the idea, but believe that there is much more room for error with that aspect. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-216445374 @rxin I also realised Python API is supporting properties as a dict having " arbitrary string tag/value", [here](https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L847-L849). Could we go ahead with this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-214992526 Even if so, I think we need a kinda wrapper or something to safely convert `Properties` to `Map` because newbie users could easily & wrongly put non-string values in `Properties`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user JustinPihony commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213662908 @HyukjinKwon I just posted on the JIRA the background of `Properties` and how reasonable it is to assume it can be converted to a `String`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213568166 A possible problem was noticed (is `Properties` guaranteed to be converted to `String`?) in the JIRA before this PR and any evidence was not prodivded or said before opening a PR, which I did instead in this PR. Also, I think it is a minor but still it looks not sensible that `Additional details` does not look related with the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user JustinPihony commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r60752881 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -244,7 +244,11 @@ final class DataFrameWriter private[sql](df: DataFrame) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +dataSource.providingClass.newInstance() match { + case jdbc: execution.datasources.jdbc.DefaultSource => --- End diff -- I agree and admit I was being lazy in not trying to figure out how to make the current implementation return a `BaseRelation`. I'll take another look today at just turning the DefaultSource into a `CreatableRelationProvider` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user JustinPihony commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213463779 @HyukjinKwon You will notice that I opted to not deprecate jdbc as I don't think that would be the correct path anyway (unless all format methods were to be deprecated). I'm not sure what gave the impression that I wanted to deprecate, but I think multiple methods to accomplish the same goal is perfectly fine. This change merely alters the underlying implementation so that both methods work, `write.format("jdbc")` and `write.jdbc`. So, I think that this realistically addresses all of your other comments surrounding this concern of deprecation. As to additional details, I was simply [following the contributer directions.](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest) I only put it under additional details because that was what it was to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213397222 Question: I found a [PPT](http://www.slideshare.net/SparkSummit/structuring-spark-dataframes-datasets-and-streaming-by-michael-armbrust) which I think is used in Spark Summit by @marmbrus. In 30 page, `write.format("jdbc")` is used as a example. Is there any way.to support this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213382156 I think I can rework based on this because it is anyway opened already. Excuse my ping @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213382065 BTW, it looks it is pretty general that `Properties` just works like `HashMap[String, String]` in most cases. Firstly, I just checked [java.sql.Driver API](https://docs.oracle.com/javase/7/docs/api/java/sql/Driver.html) and it describes the argument for `Properties` as below: >info - a list of arbitrary string tag/value pairs as connection arguments. Normally at least a "user" and "password" property should be included. Secondly, apparently Spark uses the methods below in `Properties`. ```java public Set stringPropertyNames() public String getProperty(String key, String defaultValue) public void store(OutputStream out, String comments) // This uses convert for keys and values internally. public synchronized Object setProperty(String key, String value) ``` It looks they uses `String` for keys and values. So, I think it might be OK to support `write.format("jdbc")`. I believe `read.format("jdbc")` is already being supported and I could not find JIRA issues about the problem for giving some options for `read.format("jdbc")`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r60721164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -244,7 +244,11 @@ final class DataFrameWriter private[sql](df: DataFrame) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +dataSource.providingClass.newInstance() match { + case jdbc: execution.datasources.jdbc.DefaultSource => --- End diff -- It looks a new method is introduced. I think we don't necessarily introduce this new function but use the existing interfaces `CreatableRelationProvider` in `interfaces`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213298866 I think `Additional details` could be said in comments not in the PR description because PR description describes what the PR is. Maybe `Additional details` is not related with the PR itself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213297906 @JustinPihony I think we haven't reached the conclusion yet and haven't got any feedback, from committers, if we should deprecate `read.jdbc()` or support `write.format(jdbc)` in [SPARK-14525](https://issues.apache.org/jira/browse/SPARK-14525). Usually, we should discuss the proper ways to fix and problems first in JIRA and open a PR. As I said in the JIRA, if we go for deprecating `read.jdbc()`, we might need to close this JIRA and create another one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
GitHub user JustinPihony opened a pull request: https://github.com/apache/spark/pull/12601 [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc ## What changes were proposed in this pull request? This change modifies the implementation of DataFrameWriter.save such that it works with jdbc, and the call to jdbc merely delegates to save. ## How was this patch tested? This was tested via unit tests in the JDBCWriteSuite, of which I added one new test to cover this scenario. ## Additional details @rxin This seems to have been most recently touched by you and was also commented on in the JIRA. This contribution is my original work and I license the work to the project under the project's open source license. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JustinPihony/spark jdbc_reconciliation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12601.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12601 commit 6be5046119f1f9cffb78a08a9dbe614ed4f9d460 Author: Justin Pihony Date: 2016-04-22T05:55:34Z [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12601#issuecomment-213274569 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org