[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

liancheng Mon, 13 Oct 2014 05:17:15 -0700

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2576#discussion_r18765864
  
    --- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala 
---
    @@ -62,6 +62,16 @@ object HiveFromSpark {
         println("Result of SELECT *:")
         sql("SELECT * FROM records r JOIN src s ON r.key = 
s.key").collect().foreach(println)
     
    +    // Write out an RDD as a orc file.
    +    rdd.saveAsOrcFile("pair.orc")
    +
    +    // Read in orc file. Orc files are self-describing so the schmema is 
preserved.
    +    val orcFile = hiveContext.orcFile("pair.orc")
    +
    +    // These files can also be registered as tables.
    +    orcFile.registerTempTable("orcFile")
    +    sql("SELECT * FROM records r JOIN orcFile s ON r.key = 
s.key").collect().foreach(println)
    +
    --- End diff --
    
    I think test cases and documentation can be better places to illustrate the 
API usage. This example is used to illustrate how Spark SQL cooperates with 
Hive. And with this PR, we don't need Hive (Metastore) to access ORC files.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

Reply via email to