[ 
https://issues.apache.org/jira/browse/HBASE-17547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Khatri updated HBASE-17547:
----------------------------------
    Description: 
Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns from 
single column family.

Description:
Datasource API under HBase-Spark Module having error, which accessing more than 
1 columns from same column family.
If your catalog having the format where you have multiple columns from single / 
multiple column family, at that point it throws an exception, for example.

def empcatalog = s"""{
|"table":{"namespace":"empschema", "name":"emp"},
|"rowkey":"key",
|"columns":{
|"empNumber":{"cf":"rowkey", "col":"key", "type":"string"},
|"city":{"cf":"pdata", "col":"city", "type":"string"},
|"empName":{"cf":"pdata", "col":"name", "type":"string"},
|"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"},
|"salary":{"cf":"pdata", "col":"salary", "type":"string"}
|}
|}""".stripMargin

Here, we have city, name, designation, salary from pdata column family.

Exception while saving Dataframe at HBase.

java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be 
added
at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
at 
org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)

HBaseTableCatalog.scala class has getColumnFamilies method which returns 
duplicates, which should not return.

Unit test has been written for the same at DefaultSourceSuite.scala, 
writeCatalog object definition.

  was:
Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns from 
single column family.

Description:
Datasource API under HBase-Spark Module having error, which accessing more than 
1 columns from same column family.
If your catalog having the format where you have multiple columns from single / 
multiple column family, at that point it throws an exception, for example.

def empcatalog = s"""{
|"table":{"namespace":"empschema", "name":"emp"},
|"rowkey":"key",
|"columns":{
|"empNumber":{"cf":"rowkey", "col":"key", "type":"string"},
|"city":{"cf":"pdata", "col":"city", "type":"string"},
|"empName":{"cf":"pdata", "col":"name", "type":"string"},
|"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"},
|"salary":{"cf":"pdata", "col":"salary", "type":"string"}
|}
|}""".stripMargin

Here, we have city, name, designation, salary from pdata column family.

Exception while saving Dataframe at HBase.

java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be 
added
at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
at 
org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)

HBaseTableCatalog.scala class has getColumnFamilies method which returns 
duplicates, which should not return.

Unit test has been written for the same at HBaseTableCatelog.scala, 
writeCatalog object definition.


> HBase-Spark Module : TableCatelog doesn't supports multiple columns from 
> Single Column family
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17547
>                 URL: https://issues.apache.org/jira/browse/HBASE-17547
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase, spark
>    Affects Versions: 1.1.8
>            Reporter: Chetan Khatri
>             Fix For: 1.1.8
>
>
> Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns 
> from single column family.
> Description:
> Datasource API under HBase-Spark Module having error, which accessing more 
> than 1 columns from same column family.
> If your catalog having the format where you have multiple columns from single 
> / multiple column family, at that point it throws an exception, for example.
> def empcatalog = s"""{
> |"table":{"namespace":"empschema", "name":"emp"},
> |"rowkey":"key",
> |"columns":{
> |"empNumber":{"cf":"rowkey", "col":"key", "type":"string"},
> |"city":{"cf":"pdata", "col":"city", "type":"string"},
> |"empName":{"cf":"pdata", "col":"name", "type":"string"},
> |"jobDesignation":{"cf":"pdata", "col":"designation", "type":"string"},
> |"salary":{"cf":"pdata", "col":"salary", "type":"string"}
> |}
> |}""".stripMargin
> Here, we have city, name, designation, salary from pdata column family.
> Exception while saving Dataframe at HBase.
> java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot 
> be added
> at 
> org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827)
> at 
> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98)
> at 
> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at 
> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
> at 
> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
> HBaseTableCatalog.scala class has getColumnFamilies method which returns 
> duplicates, which should not return.
> Unit test has been written for the same at DefaultSourceSuite.scala, 
> writeCatalog object definition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to