[ https://issues.apache.org/jira/browse/HBASE-17547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Busbey updated HBASE-17547: -------------------------------- Assignee: Chetan Khatri > HBase-Spark Module : TableCatelog doesn't supports multiple columns from > Single Column family > --------------------------------------------------------------------------------------------- > > Key: HBASE-17547 > URL: https://issues.apache.org/jira/browse/HBASE-17547 > Project: HBase > Issue Type: Bug > Components: spark > Affects Versions: 1.1.8 > Reporter: Chetan Khatri > Assignee: Chetan Khatri > > Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns > from single column family. > Description: > Datasource API under HBase-Spark Module having error, which accessing more > than 1 columns from same column family. > If your catalog having the format where you have multiple columns from single > / multiple column family, at that point it throws an exception, for example. > def empcatalog = s"""{ > |"table":{"namespace":"empschema", "name":"emp"}, > |"rowkey":"key", > |"columns":{ > |"empNumber":{"cf":"rowkey", "col":"key", "type":"string"}, > |"city":{"cf":"pdata", "col":"city", "type":"string"}, > |"empName":{"cf":"pdata", "col":"name", "type":"string"}, > |"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"}, > |"salary":{"cf":"pdata", "col":"salary", "type":"string"} > |} > |}""".stripMargin > Here, we have city, name, designation, salary from pdata column family. > Exception while saving Dataframe at HBase. > java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot > be added > at > org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827) > at > org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98) > at > org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95) > at > org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) > HBaseTableCatalog.scala class has getColumnFamilies method which returns > duplicates, which should not return. > Unit test has been written for the same at DefaultSourceSuite.scala, > writeCatalog object definition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)