[jira] [Commented] (SPARK-6923) Get invalid hive table columns after save DataFrame to hive table

pin_zhang (JIRA) Mon, 20 Apr 2015 23:11:23 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504409#comment-14504409
 ]


pin_zhang commented on SPARK-6923:
----------------------------------

Hi, Michael
  We run spark app in Spark1.3, and  use the CLIService in HiveServer2 to get 
the table schema, the call stack to get the schema as below
        HiveMetaStore$HMSHandler.get_fields(String, String) line: 2873  
        HiveMetaStore$HMSHandler.get_schema(String, String) line: 2946  
        NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
available [native method]  
        NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57      
        DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
        Method.invoke(Object, Object...) line: 606      
        RetryingHMSHandler.invoke(Object, Method, Object[]) line: 105   
        $Proxy9.get_schema(String, String) line: not available  
        HiveMetaStoreClient.getSchema(String, String) line: 1269        
        GetColumnsOperation.run() line: 139     
        HiveSessionImplwithUGI(HiveSessionImpl).getColumns(String, String, 
String, String) line: 359    
        NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
available [native method]  
        NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57      
        DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
        Method.invoke(Object, Object...) line: 606      
        HiveSessionProxy.invoke(Method, Object[]) line: 79      
        HiveSessionProxy.access$000(HiveSessionProxy, Method, Object[]) line: 
37        
        HiveSessionProxy$1.run() line: 64       
        AccessController.doPrivileged(PrivilegedExceptionAction<T>, 
AccessControlContext) line: not available [native method]   
        Subject.doAs(Subject, PrivilegedExceptionAction<T>) line: 415   
        UserGroupInformation.doAs(PrivilegedExceptionAction<T>) line: 1548      
        Hadoop23Shims(HadoopShimsSecure).doAs(UserGroupInformation, 
PrivilegedExceptionAction<T>) line: 493     
        HiveSessionProxy.invoke(Object, Method, Object[]) line: 60      
        $Proxy17.getColumns(String, String, String, String) line: not available 
        SparkSQLCLIService(CLIService).getColumns(SessionHandle, String, 
String, String, String) line: 309      
        ThriftBinaryCLIService(ThriftCLIService).GetColumns(TGetColumnsReq) 
line: 433   
        TCLIService$Processor$GetColumns<I>.getResult(I, GetColumns_args) line: 
1433    
        TCLIService$Processor$GetColumns<I>.getResult(Object, TBase) line: 1418 
        TCLIService$Processor$GetColumns<I>(ProcessFunction<I,T>).process(int, 
TProtocol, TProtocol, I) line: 39        
        TSetIpAddressProcessor<I>(TBaseProcessor<I>).process(TProtocol, 
TProtocol) line: 39     
        TSetIpAddressProcessor<I>.process(TProtocol, TProtocol) line: 55        
        TThreadPoolServer$WorkerProcess.run() line: 206 
        ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145      
        ThreadPoolExecutor$Worker.run() line: 615       
        Thread.run() line: 745  

   Don't you think the method should return the same table schema as that you 
said hctx.table("tableName").schema?

> Get invalid hive table columns after save DataFrame to hive table
> -----------------------------------------------------------------
>
>                 Key: SPARK-6923
>                 URL: https://issues.apache.org/jira/browse/SPARK-6923
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: pin_zhang
>
> HiveContext hctx = new HiveContext(sc);
> List<String> sample = new ArrayList<String>();
> sample.add( "{\"id\": \"id_1\", \"age\":1}" );
> RDD<String> sampleRDD = new JavaSparkContext(sc).parallelize(sample).rdd();   
> DataFrame df = hctx.jsonRDD(sampleRDD);
> String table="test";
> df.saveAsTable(table, "json",SaveMode.Overwrite);
> Table t = hctx.catalog().client().getTable(table);
> System.out.println( t.getCols());
> --------------------------------------------------------------
> With the code above to save DataFrame to hive table,
> Get table cols returns one column named 'col'
> [FieldSchema(name:col, type:array<string>, comment:from deserializer)]
> Expected return fields schema id, age.
> This results in the jdbc API cannot retrieves the table columns via ResultSet 
> DatabaseMetaData.getColumns(String catalog, String schemaPattern,String 
> tableNamePattern, String columnNamePattern)
> But resultset metadata for query " select * from test "  contains fields id, 
> age.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6923) Get invalid hive table columns after save DataFrame to hive table

Reply via email to