Re: Spark SQL Parquet as External table - 1.3.x HiveMetastoreType now hidden

2015-04-13 Thread Michael Armbrust

 Here is the stack trace. The first part shows the log when the session is
 started in Tableau. It is using the init sql option on the data
 connection to create theTEMPORARY table myNodeTable.


Ah, I see. thanks for providing the error.  The problem here is that
temporary tables do not exist in a database.  They are visible no matter
what the current database is.  Tableau is asking for
default.temporaryTable, which does not exist.


Spark SQL Parquet as External table - 1.3.x HiveMetastoreType now hidden

2015-04-06 Thread Todd Nist
In 1.2.1 of I was persisting a set of parquet files as a table for use by
spark-sql cli later on. There was a post here
http://apache-spark-user-list.1001560.n3.nabble.com/persist-table-schema-in-spark-sql-tt16297.html#a16311
by
Mchael Armbrust that provide a nice little helper method for dealing with
this:

/**
 * Sugar for creating a Hive external table from a parquet path.
 */def createParquetTable(name: String, file: String): Unit = {
  import org.apache.spark.sql.hive.HiveMetastoreTypes

  val rdd = parquetFile(file)
  val schema = rdd.schema.fields.map(f = s${f.name}
${HiveMetastoreTypes.toMetastoreType(f.dataType)}).mkString(,\n)
  val ddl = s
|CREATE EXTERNAL TABLE $name (
|  $schema
|)
|ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
|STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
|OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
|LOCATION '$file'.stripMargin
  sql(ddl)
  setConf(spark.sql.hive.convertMetastoreParquet, true)
}

In migrating to 1.3.x I see that the spark.sql.hive.convertMetastoreParquet
is no longer public, so the above no longer works.

I can define a helper method that wraps the HiveMetastoreTypes something
like:

package org.apache.spark.sql.hive
import org.apache.spark.sql.types.DataType

/**
 * Helper to expose HiveMetastoreTypes hidden by Spark.  It is created
in this name space to make it accessible.
 */
object HiveTypeHelper {
  def toDataType(metastoreType: String): DataType =
HiveMetastoreTypes.toDataType(metastoreType)
  def toMetastoreType(dataType: DataType): String =
HiveMetastoreTypes.toMetastoreType(dataType)
}

While this will work, is there a better way to achieve this under 1.3.x?

TIA for the assistance.

-Todd


Re: Spark SQL Parquet as External table - 1.3.x HiveMetastoreType now hidden

2015-04-06 Thread Michael Armbrust
Hey Todd,

In migrating to 1.3.x I see that the spark.sql.hive.convertMetastoreParquet
 is no longer public, so the above no longer works.


This was probably just a typo, but to be clear,
 spark.sql.hive.convertMetastoreParquet is still a supported option and
should work.  You are correct that the HiveMetastoreTypes class is now
private (we made a lot of stuff private starting with 1.3 (and the removal
of alpha) since we are now promising binary compatibility for public
APIs. Your hack seems reasonable, but I'll caution this is not a stable
public API so could break with future upgrades.

While this will work, is there a better way to achieve this under 1.3.x?


If you are only looking for the ability to read this data with Spark SQL
(and not Hive) I suggest you look at the Data Sources API syntax for
creating tables.  You don't need to specify the schema at all for self
describing formats like parquet.

CREATE TABLE tableName
USING parquet
OPTIONS (
  path '/path/to/file'
)

Michael


On Mon, Apr 6, 2015 at 11:37 AM, Todd Nist tsind...@gmail.com wrote:

 In 1.2.1 of I was persisting a set of parquet files as a table for use by
 spark-sql cli later on. There was a post here
 http://apache-spark-user-list.1001560.n3.nabble.com/persist-table-schema-in-spark-sql-tt16297.html#a16311
  by
 Mchael Armbrust that provide a nice little helper method for dealing with
 this:

 /**
  * Sugar for creating a Hive external table from a parquet path.
  */def createParquetTable(name: String, file: String): Unit = {
   import org.apache.spark.sql.hive.HiveMetastoreTypes

   val rdd = parquetFile(file)
   val schema = rdd.schema.fields.map(f = s${f.name} 
 ${HiveMetastoreTypes.toMetastoreType(f.dataType)}).mkString(,\n)
   val ddl = s
 |CREATE EXTERNAL TABLE $name (
 |  $schema
 |)
 |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
 |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
 |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
 |LOCATION '$file'.stripMargin
   sql(ddl)
   setConf(spark.sql.hive.convertMetastoreParquet, true)
 }

 In migrating to 1.3.x I see that the
 spark.sql.hive.convertMetastoreParquet is no longer public, so the above no
 longer works.

 I can define a helper method that wraps the HiveMetastoreTypes something
 like:

 package org.apache.spark.sql.hive
 import org.apache.spark.sql.types.DataType

 /**
  * Helper to expose HiveMetastoreTypes hidden by Spark.  It is created in 
 this name space to make it accessible.
  */
 object HiveTypeHelper {
   def toDataType(metastoreType: String): DataType = 
 HiveMetastoreTypes.toDataType(metastoreType)
   def toMetastoreType(dataType: DataType): String = 
 HiveMetastoreTypes.toMetastoreType(dataType)
 }

 While this will work, is there a better way to achieve this under 1.3.x?

 TIA for the assistance.

 -Todd