[ https://issues.apache.org/jira/browse/HBASE-17547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chetan Khatri updated HBASE-17547: ---------------------------------- Description: Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns from single column family. Description: Datasource API under HBase-Spark Module having error, which accessing more than 1 columns from same column family. If your catalog having the format where you have multiple columns from single / multiple column family, at that point it throws an exception, for example. def empcatalog = s"""{ |"table":{"namespace":"empschema", "name":"emp"}, |"rowkey":"key", |"columns":{ |"empNumber":{"cf":"rowkey", "col":"key", "type":"string"}, |"city":{"cf":"pdata", "col":"city", "type":"string"}, |"empName":{"cf":"pdata", "col":"name", "type":"string"}, |"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"}, |"salary":{"cf":"pdata", "col":"salary", "type":"string"} |} |}""".stripMargin Here, we have city, name, designation, salary from pdata column family. Exception while saving Dataframe at HBase. java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be added at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95) at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) HBaseTableCatalog.scala class has getColumnFamilies method which returns duplicates, which should not return. Unit test has been written for the same at DefaultSourceSuite.scala, writeCatalog object definition. was: Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns from single column family. Description: Datasource API under HBase-Spark Module having error, which accessing more than 1 columns from same column family. If your catalog having the format where you have multiple columns from single / multiple column family, at that point it throws an exception, for example. def empcatalog = s"""{ |"table":{"namespace":"empschema", "name":"emp"}, |"rowkey":"key", |"columns":{ |"empNumber":{"cf":"rowkey", "col":"key", "type":"string"}, |"city":{"cf":"pdata", "col":"city", "type":"string"}, |"empName":{"cf":"pdata", "col":"name", "type":"string"}, |"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"}, |"salary":{"cf":"pdata", "col":"salary", "type":"string"} |} |}""".stripMargin Here, we have city, name, designation, salary from pdata column family. Exception while saving Dataframe at HBase. java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be added at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95) at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) HBaseTableCatalog.scala class has getColumnFamilies method which returns duplicates, which should not return. Unit test has been written for the same at HBaseTableCatelog.scala, writeCatalog object definition. > HBase-Spark Module : TableCatelog doesn't supports multiple columns from > Single Column family > --------------------------------------------------------------------------------------------- > > Key: HBASE-17547 > URL: https://issues.apache.org/jira/browse/HBASE-17547 > Project: HBase > Issue Type: Bug > Components: hbase, spark > Affects Versions: 1.1.8 > Reporter: Chetan Khatri > Fix For: 1.1.8 > > > Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns > from single column family. > Description: > Datasource API under HBase-Spark Module having error, which accessing more > than 1 columns from same column family. > If your catalog having the format where you have multiple columns from single > / multiple column family, at that point it throws an exception, for example. > def empcatalog = s"""{ > |"table":{"namespace":"empschema", "name":"emp"}, > |"rowkey":"key", > |"columns":{ > |"empNumber":{"cf":"rowkey", "col":"key", "type":"string"}, > |"city":{"cf":"pdata", "col":"city", "type":"string"}, > |"empName":{"cf":"pdata", "col":"name", "type":"string"}, > |"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"}, > |"salary":{"cf":"pdata", "col":"salary", "type":"string"} > |} > |}""".stripMargin > Here, we have city, name, designation, salary from pdata column family. > Exception while saving Dataframe at HBase. > java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot > be added > at > org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827) > at > org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98) > at > org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95) > at > org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) > HBaseTableCatalog.scala class has getColumnFamilies method which returns > duplicates, which should not return. > Unit test has been written for the same at DefaultSourceSuite.scala, > writeCatalog object definition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)