[ https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated SPARK-6923: ---------------------------- Description: {code:java} HiveContext hctx = new HiveContext(sc); List<String> sample = new ArrayList<String>(); sample.add( "{\"id\": \"id_1\", \"age\":1}" ); RDD<String> sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd(); DataFrame df = hctx.jsonRDD(sampleRDD); String table="test"; df.saveAsTable(table, "json",SaveMode.Overwrite); Table t = hctx.catalog().client().getTable(table); System.out.println( t.getCols()); {code} -------------------------------------------------------------- With the code above to save DataFrame to hive table, Get table cols returns one column named 'col' [FieldSchema(name:col, type:array<string>, comment:from deserializer)] Expected return fields schema id, age. This results in the jdbc API cannot retrieves the table columns via ResultSet DatabaseMetaData.getColumns(String catalog, String schemaPattern,String tableNamePattern, String columnNamePattern) But resultset metadata for query " select * from test " contains fields id, age. was: HiveContext hctx = new HiveContext(sc); List<String> sample = new ArrayList<String>(); sample.add( "{\"id\": \"id_1\", \"age\":1}" ); RDD<String> sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd(); DataFrame df = hctx.jsonRDD(sampleRDD); String table="test"; df.saveAsTable(table, "json",SaveMode.Overwrite); Table t = hctx.catalog().client().getTable(table); System.out.println( t.getCols()); -------------------------------------------------------------- With the code above to save DataFrame to hive table, Get table cols returns one column named 'col' [FieldSchema(name:col, type:array<string>, comment:from deserializer)] Expected return fields schema id, age. This results in the jdbc API cannot retrieves the table columns via ResultSet DatabaseMetaData.getColumns(String catalog, String schemaPattern,String tableNamePattern, String columnNamePattern) But resultset metadata for query " select * from test " contains fields id, age. > Spark SQL CLI does not read Data Source schema correctly > -------------------------------------------------------- > > Key: SPARK-6923 > URL: https://issues.apache.org/jira/browse/SPARK-6923 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.0 > Reporter: pin_zhang > > {code:java} > HiveContext hctx = new HiveContext(sc); > List<String> sample = new ArrayList<String>(); > sample.add( "{\"id\": \"id_1\", \"age\":1}" ); > RDD<String> sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd(); > DataFrame df = hctx.jsonRDD(sampleRDD); > String table="test"; > df.saveAsTable(table, "json",SaveMode.Overwrite); > Table t = hctx.catalog().client().getTable(table); > System.out.println( t.getCols()); > {code} > -------------------------------------------------------------- > With the code above to save DataFrame to hive table, > Get table cols returns one column named 'col' > [FieldSchema(name:col, type:array<string>, comment:from deserializer)] > Expected return fields schema id, age. > This results in the jdbc API cannot retrieves the table columns via ResultSet > DatabaseMetaData.getColumns(String catalog, String schemaPattern,String > tableNamePattern, String columnNamePattern) > But resultset metadata for query " select * from test " contains fields id, > age. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org