Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18765864 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala --- @@ -62,6 +62,16 @@ object HiveFromSpark { println("Result of SELECT *:") sql("SELECT * FROM records r JOIN src s ON r.key = s.key").collect().foreach(println) + // Write out an RDD as a orc file. + rdd.saveAsOrcFile("pair.orc") + + // Read in orc file. Orc files are self-describing so the schmema is preserved. + val orcFile = hiveContext.orcFile("pair.orc") + + // These files can also be registered as tables. + orcFile.registerTempTable("orcFile") + sql("SELECT * FROM records r JOIN orcFile s ON r.key = s.key").collect().foreach(println) + --- End diff -- I think test cases and documentation can be better places to illustrate the API usage. This example is used to illustrate how Spark SQL cooperates with Hive. And with this PR, we don't need Hive (Metastore) to access ORC files.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org