[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

HyukjinKwon Sat, 10 Nov 2018 19:56:54 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22954#discussion_r232473643
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
    @@ -225,4 +226,25 @@ private[sql] object SQLUtils extends Logging {
         }
         sparkSession.sessionState.catalog.listTables(db).map(_.table).toArray
       }
    +
    +  /**
    +   * R callable function to read a file in Arrow stream format and create 
a `RDD`
    +   * using each serialized ArrowRecordBatch as a partition.
    +   */
    +  def readArrowStreamFromFile(
    +      sparkSession: SparkSession,
    +      filename: String): JavaRDD[Array[Byte]] = {
    +    ArrowConverters.readArrowStreamFromFile(sparkSession.sqlContext, 
filename)
    --- End diff --
    
    Hmhmhm .. yea. What I was trying to do is to add SQL related codes called 
in R from JVM, into here when they are not official APIs in order to avoid, we 
change the internal APIs within Scala, and it causes R test failure. I was 
trying to do the similar things within PySpark side.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

Reply via email to