For me inference is not an issue as compared to persistence. Imagine a Streaming application where the input is JSON whose format can vary from row to row and whose format I cannot pre-determine. I can use `sqlContext.jsonRDD` , but once I have the `SchemaRDD`, there is no way for me to update the ddl of the Hive table to add the extra columns that I may have encountered in a JSON row.
-- Hari On Tue, Oct 21, 2014 at 6:11 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > You can resort to SQLContext.jsonFile(path: String, samplingRate: Double) > and set samplingRate to 1.0, so that all the columns can be inferred. > > You can also use SQLContext.applySchema to specify your own schema (which > is a StructType). > > On 10/22/14 5:56 AM, Harivardan Jayaraman wrote: > > Hi, > I have unstructured JSON as my input which may have extra columns row to > row. I want to store these json rows using HiveContext so that it can be > accessed from the JDBC Thrift Server. > I notice there are primarily only two methods available on the SchemaRDD > for data - saveAsTable and insertInto. One defines the schema while the > other can be used to insert in to the table, but there is no way to Alter > the table and add columns to it. > How do I do this? > > One option that I thought of is to write native "CREATE TABLE..." and > "ALTER TABLE.." statements but just does not seem feasible because at every > step, I will need to query Hive to determine what is the current schema and > make a decision whether I should add columns to it or not. > > Any thoughts? Has anyone been able to do this? > > >