[GitHub] spark pull request #19339: [SPARK-22112][PYSPARK] Add an API to create a Dat...

goldmedal Mon, 25 Sep 2017 06:44:31 -0700

Github user goldmedal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19339#discussion_r140779203
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -456,6 +456,40 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
       }
     
       /**
    +   * Loads a `JavaRDD[String]` storing storing CSV rows and returns the 
result as a `DataFrame`.
    +   *
    +   * If the schema is not specified using `schema` function and 
`inferSchema` option is enabled,
    +   * this function goes through the input once to determine the input 
schema.
    +   *
    +   * If the schema is not specified using `schema` function and 
`inferSchema` option is disabled,
    +   * it determines the columns as string types and it reads only the first 
line to determine the
    +   * names and the number of fields.
    +   *
    +   * @param csvRDD input RDD with one CSV row per record
    +   * @since 2.2.0
    +   */
    +  @deprecated("Use csv(Dataset[String]) instead.", "2.2.0")
    +  def csv(csvRDD: JavaRDD[String]): DataFrame = csv(csvRDD.rdd)
    +
    +  /**
    +   * Loads a `RDD[String]` storing storing CSV rows and returns the result 
as a `DataFrame`.
    +   *
    +   * If the schema is not specified using `schema` function and 
`inferSchema` option is enabled,
    +   * this function goes through the input once to determine the input 
schema.
    +   *
    +   * If the schema is not specified using `schema` function and 
`inferSchema` option is disabled,
    +   * it determines the columns as string types and it reads only the first 
line to determine the
    +   * names and the number of fields.
    +   *
    +   * @param csvRDD input RDD with one CSV row per record
    +   * @since 2.2.0
    +   */
    +  @deprecated("Use csv(Dataset[String]) instead.", "2.2.0")
    +  def csv(csvRDD: RDD[String]): DataFrame = {
    --- End diff --
    
    Thanks for your reviewing :)
    umm..  I followed `spark.read.json`'s way to add them. Although 
`json(jsonRDD :RDD[String]` has been deprecated, PySpark still use it to create 
a `DataFrame`. I think adding a private wrapper in Scala maybe better because 
not only PySpark but SparkR maybe need it.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19339: [SPARK-22112][PYSPARK] Add an API to create a Dat...

Reply via email to