[ https://issues.apache.org/jira/browse/HBASE-22711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Balazs Meszaros resolved HBASE-22711. ------------------------------------- Resolution: Fixed Fix Version/s: connector-1.0.1 > Spark connector doesn't use the given mapping when inserting data > ----------------------------------------------------------------- > > Key: HBASE-22711 > URL: https://issues.apache.org/jira/browse/HBASE-22711 > Project: HBase > Issue Type: Bug > Components: hbase-connectors > Affects Versions: connector-1.0.0 > Reporter: Balazs Meszaros > Assignee: Balazs Meszaros > Priority: Major > Fix For: connector-1.0.1 > > > In some cases a Spark DataFrames cannot be read back with the same mapping as > they were written. For example: > {code:scala} > val sql = spark.sqlContext > val persons = > """[ > |{"name": "alice", "age": 20, "height": 5, "email": "al...@alice.com"}, > |{"name": "bob", "age": 23, "height": 6, "email": "b...@bob.com"}, > |{"name": "carol", "age": 12, "email": "ca...@carol.com", "height": > 4.11} > |] > """.stripMargin > val df = spark.read.json(Seq(persons).toDS) > df.write > .format("org.apache.hadoop.hbase.spark") > .option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email > STRING c:email, height FLOAT p:height") > .option("hbase.table", "person") > .option("hbase.spark.use.hbasecontext", false) > .save() > {code} > It cannot be read back with the same mapping: > {code:scala} > val df2 = sql.read > .format("org.apache.hadoop.hbase.spark") > .option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email > STRING c:email, height FLOAT p:height") > .option("hbase.table", "person") > .option("hbase.spark.use.hbasecontext", false) > .load() > df2.createOrReplaceTempView("tableView") > val results = sql.sql("SELECT * FROM tableView") > results.show() > {code} > The results: > {noformat} > +---+-----+---------+---------------+ > |age| name| height| email| > +---+-----+---------+---------------+ > | 0|alice| 2.3125|al...@alice.com| > | 0| bob| 2.375| b...@bob.com| > | 0|carol|2.2568748|ca...@carol.com| > +---+-----+---------+---------------+ > {noformat} > Spark stores integer values in long, floating point values in double so > shorts become 8 bytes long, floats also become 8 bytes long in HBase: > {noformat} > shell> scan 'person' > alice column=p:age, timestamp=1563450714829, > value=\x00\x00\x00\x00\x00\x00\x00\x14 > alice column=p:height, timestamp=1563450714829, > value=@\x14\x00\x00\x00\x00\x00\x00 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016)