[jira] [Resolved] (HBASE-22711) Spark connector doesn't use the given mapping when inserting data

Balazs Meszaros (JIRA) Mon, 22 Jul 2019 07:33:32 -0700


     [ 
https://issues.apache.org/jira/browse/HBASE-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Balazs Meszaros resolved HBASE-22711.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: connector-1.0.1

> Spark connector doesn't use the given mapping when inserting data
> -----------------------------------------------------------------
>
>                 Key: HBASE-22711
>                 URL: https://issues.apache.org/jira/browse/HBASE-22711
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase-connectors
>    Affects Versions: connector-1.0.0
>            Reporter: Balazs Meszaros
>            Assignee: Balazs Meszaros
>            Priority: Major
>             Fix For: connector-1.0.1
>
>
> In some cases a Spark DataFrames cannot be read back with the same mapping as 
> they were written. For example:
> {code:scala}
> val sql = spark.sqlContext
> val persons =
>     """[
>       |{"name": "alice", "age": 20, "height": 5, "email": "[email protected]"},
>       |{"name": "bob", "age": 23, "height": 6, "email": "[email protected]"},
>       |{"name": "carol", "age": 12, "email": "[email protected]", "height": 
> 4.11}
>       |]
>     """.stripMargin
> val df = spark.read.json(Seq(persons).toDS)
> df.write
>   .format("org.apache.hadoop.hbase.spark")
>   .option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email 
> STRING c:email, height FLOAT p:height")
>   .option("hbase.table", "person")
>   .option("hbase.spark.use.hbasecontext", false)
>   .save()
> {code}
> It cannot be read back with the same mapping:
> {code:scala}
> val df2 = sql.read
>   .format("org.apache.hadoop.hbase.spark")
>   .option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email 
> STRING c:email, height FLOAT p:height")
>   .option("hbase.table", "person")
>   .option("hbase.spark.use.hbasecontext", false)
>   .load()
> df2.createOrReplaceTempView("tableView")
> val results = sql.sql("SELECT * FROM tableView")
> results.show()
> {code}
> The results:
> {noformat}
> +---+-----+---------+---------------+
> |age| name|   height|          email|
> +---+-----+---------+---------------+
> |  0|alice|   2.3125|[email protected]|
> |  0|  bob|    2.375|    [email protected]|
> |  0|carol|2.2568748|[email protected]|
> +---+-----+---------+---------------+
> {noformat}
> Spark stores integer values in long, floating point values in double so 
> shorts become 8 bytes long, floats also become 8 bytes long in HBase:
> {noformat}
> shell> scan 'person'
>  alice                column=p:age, timestamp=1563450714829, 
> value=\x00\x00\x00\x00\x00\x00\x00\x14
>  alice                column=p:height, timestamp=1563450714829, 
> value=@\x14\x00\x00\x00\x00\x00\x00
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Resolved] (HBASE-22711) Spark connector doesn't use the given mapping when inserting data

Reply via email to