[ https://issues.apache.org/jira/browse/PHOENIX-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manohar Chamaraju updated PHOENIX-5410: --------------------------------------- Summary: Phoenix spark to hbase connector takes long time persist data (was: Phoenix spark takes long time persist data to hbase) > Phoenix spark to hbase connector takes long time persist data > ------------------------------------------------------------- > > Key: PHOENIX-5410 > URL: https://issues.apache.org/jira/browse/PHOENIX-5410 > Project: Phoenix > Issue Type: Bug > Reporter: Manohar Chamaraju > Priority: Major > > While using the phoenix spark connector 1.0.0-SNAPSHOT > ([https://github.com/apache/phoenix-connectors/tree/master/phoenix-spark]) > for hbase found that write was taking really long time. > On profiling the connector found that 90% of cpu time is consumed in method > SparkJdbcUtil.toRow() method. > !https://files.slack.com/files-pri/T037D1PV9-FKYGD504A/image.png! > If i look into code SparkJdbcUtil.toRow() method gets called for every field > of a row and RowEncoder(schema).resolveAndBind() object gets created for > every iteration because of this lots of encoder objects get created and > collected by GC eventually causing CPU cycles and causing performance > degradation. > Moreover SparkJdbcUtil.toRow() is called by PhoenixDataWriter.write() where > schema for writer object is same for all rows hence we can optimize the code > there by avoiding creating unnecessary objects and gaining good % of > performance improvement. -- This message was sent by Atlassian JIRA (v7.6.14#76016)