[ 
https://issues.apache.org/jira/browse/PHOENIX-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205639#comment-14205639
 ] 

Robert Roland commented on PHOENIX-1430:
----------------------------------------

That seems to have fixed it. I was using the version of Phoenix bundled into 
HDP 2.2 Preview while in development.

Thanks for the amazingly fast response!

> Spark queries against tables with VARCHAR ARRAY columns fail
> ------------------------------------------------------------
>
>                 Key: PHOENIX-1430
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1430
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.1
>            Reporter: Robert Roland
>
> Running Phoenix 4.1 against HDP 2.2 Preview, I'm unable to execute queries in 
> Spark against tables that contain VARCHAR ARRAY columns. Given the error, I 
> think it's likely to affect any array column.
> Given the following table schema:
> {noformat}
> CREATE TABLE ARRAY_TEST_TABLE (
>   ID BIGINT NOT NULL,
>   STRING_ARRAY VARCHAR[]
>   CONSTRAINT pk PRIMARY KEY (ID));
> {noformat}
> I am unable to execute a query via Spark, using the PhoenixInputFormat:
> {noformat}
> val phoenixConf = new PhoenixPigConfiguration(new Configuration())
> phoenixConf.setSelectStatement("SELECT ID, STRING_ARRAY FROM 
> ARRAY_TEST_TABLE")
> phoenixConf.setSelectColumns("ID,STRING_ARRAY")
> phoenixConf.setSchemaType(SchemaType.QUERY)
> phoenixConf.configure("sandbox.hortonworks.com:2181:/hbase-unsecure", 
> "ARRAY_TEST_TABLE", 100)
> val phoenixRDD = sc.newAPIHadoopRDD(phoenixConf.getConfiguration,
>   classOf[PhoenixInputFormat],
>   classOf[NullWritable],
>   classOf[PhoenixRecord])
> val count = phoenixRDD.count()
> {noformat}
> I get the following error:
> {noformat}
>   java.lang.RuntimeException: java.sql.SQLException: 
> org.apache.phoenix.schema.IllegalDataException: Unsupported sql type: VARCHAR 
> ARRAY
>   at 
> org.apache.phoenix.pig.hadoop.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:162)
>   at 
> org.apache.phoenix.pig.hadoop.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:88)
>   at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:94)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135)
>   at org.apache.spark.rdd.RDD.count(RDD.scala:904)
>   at 
> com.simplymeasured.spark.PhoenixRDDTest$$anonfun$4.apply$mcV$sp(PhoenixRDDTest.scala:147)
>   ...
>   Cause: java.sql.SQLException: 
> org.apache.phoenix.schema.IllegalDataException: Unsupported sql type: VARCHAR 
> ARRAY
>   at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:947)
>   at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1171)
>   at 
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:315)
>   at 
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:284)
>   at 
> org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:289)
>   at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:210)
>   at 
> org.apache.phoenix.compile.FromCompiler.getResolverForQuery(FromCompiler.java:158)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:300)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:290)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:926)
>   ...
>   Cause: org.apache.phoenix.schema.IllegalDataException: Unsupported sql 
> type: VARCHAR ARRAY
>   at org.apache.phoenix.schema.PDataType.fromSqlTypeName(PDataType.java:6977)
>   at 
> org.apache.phoenix.schema.PColumnImpl.createFromProto(PColumnImpl.java:195)
>   at org.apache.phoenix.schema.PTableImpl.createFromProto(PTableImpl.java:848)
>   at 
> org.apache.phoenix.coprocessor.MetaDataProtocol$MetaDataMutationResult.constructFromProto(MetaDataProtocol.java:158)
>   at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:939)
>   at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1171)
>   at 
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:315)
>   at 
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:284)
>   at 
> org.apache.phoenix.compile.FromCompiler$BaseColumnResolver.createTableRef(FromCompiler.java:289)
>   at 
> org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.<init>(FromCompiler.java:210)
> {noformat}
> Using sqlline to investigate the column's type, it looks like it's considered 
> "VARCHAR ARRAY" instead of "VARCHAR_ARRAY": (truncated for brevity)
> {noformat}
> 0: jdbc:phoenix:localhost:2181:/hbase-unsecur> !columns ARRAY_TEST_TABLE
> +------------+-------------+------------+-------------+------------+------------+
> | TABLE_CAT  | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE  | 
> TYPE_NAME  |
> +------------+-------------+------------+-------------+------------+------------+
> | null       | null        | ARRAY_TEST_TABLE | ID          | -5         | 
> BIGINT     |
> | null       | null        | ARRAY_TEST_TABLE | STRING_ARRAY | 2003       | 
> VARCHAR ARRAY |
> +------------+-------------+------------+-------------+------------+------------+
> {noformat}
> The PDataType class defines VARCHAR_ARRAY as such:
> {noformat}
> VARCHAR_ARRAY("VARCHAR_ARRAY", PDataType.ARRAY_TYPE_BASE + 
> PDataType.VARCHAR.getSqlType(), PhoenixArray.class, null) { ... }
> {noformat}
> The first parameter there being the sqlTypeName, which is "VARCHAR_ARRAY" but 
> it appears to try and look it up as "VARCHAR ARRAY" (space instead of 
> underscore)
> I'm not sure if the fix here is to change those values, or if it's deep 
> inside MetaDataEndpointImpl where the ProtoBuf returned to the client is 
> implemented when a getTable call occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to