[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

vanzin Thu, 09 Feb 2017 09:45:49 -0800

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16664#discussion_r100366124
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    val destination = source match {
    +      case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME)
    +      case _ => extraOptions.get("path")
    --- End diff --
    
    > Actually all the "magic keys" in the options used by DataFrameWriter are 
public APIs
    
    That's good to know, but they only seem to be, at best, indirectly 
documented. The `DataFrameWriter` API doesn't say anything about the keys used 
by any of the methods, and `sql-programming-guide.md` only touches on a handful 
of them; for example, none of the JDBC keys are documented.
    
    > If you want to introduce an external public interface, we need a careful 
design. This should be done in a separate PR.
    
    I agree that it needs a careful design and the current one doesn't cover 
all the options. But this PR is of very marginal value without this information 
being exposed in some way. If you guys feel strongly that it should be a map 
and that's it, I guess it will be hard to argue. Then we'll have to do that and 
document all the keys used internally by Spark and make them public, and 
promise ourselves that we'll never break them.
    
    My belief is that a more structured type would help here. Since the current 
code is obviously not enough, we could have something that's more future-proof, 
like:
    
    ```
    // Generic, just exposes the raw options, no stability guarantee past what 
SQL API provides.
    class QueryExecutionParams(options: Map[])
    
    // For FS-based sources
    class FsOutputParams(dataSourceType: String, path: String, options: Map[]) 
extends QueryExecutionParams
    
    // For JDBC
    class JdbcOutputParams(table: String, url: String, options: Map[]) extends 
QueryExecutionParams
    
    // Add others that are interesting.
    ```
    
    Then listeners can easily handle future params by matching and handling the 
generic params.
    
    Anyway, my opinion is that a raw map is not a very good API, regardless of 
API stability; it's hard to use and easy to break. But I'll defer to you guys 
if you really don't like my suggestions.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

Reply via email to