[
https://issues.apache.org/jira/browse/PHOENIX-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lars Hofhansl updated PHOENIX-5410:
-----------------------------------
Fix Version/s: 5.1.0
> Phoenix spark to hbase connector takes long time persist data
> -------------------------------------------------------------
>
> Key: PHOENIX-5410
> URL: https://issues.apache.org/jira/browse/PHOENIX-5410
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: connectors-1.0.0
> Reporter: Manohar Chamaraju
> Priority: Major
> Fix For: 5.1.0
>
> Attachments: PHOENIX-5410.patch
>
>
> While using the phoenix spark connector 1.0.0-SNAPSHOT
> ([https://github.com/apache/phoenix-connectors/tree/master/phoenix-spark])
> for hbase found that write was taking really long time.
> On profiling the connector found that 90% of cpu time is consumed in method
> SparkJdbcUtil.toRow() method.
> !https://files.slack.com/files-pri/T037D1PV9-FKYGD504A/image.png!
> If i look into code SparkJdbcUtil.toRow() method gets called for every field
> of a row and RowEncoder(schema).resolveAndBind() object gets created for
> every iteration because of this lots of encoder objects get created and
> collected by GC eventually causing CPU cycles and causing performance
> degradation.
> Moreover SparkJdbcUtil.toRow() is called by PhoenixDataWriter.write() where
> schema for writer object is same for all rows hence we can optimize the code
> there by avoiding creating unnecessary objects and gaining good % of
> performance improvement.
>
> By using changes in patch time required for write reduced from 30 minutes to
> less than 40 seconds in our test environment.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)