[jira] [Updated] (PHOENIX-5410) Phoenix spark to hbase connector takes long time persist data

Manohar Chamaraju (JIRA) Wed, 24 Jul 2019 03:03:42 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Manohar Chamaraju updated PHOENIX-5410:
---------------------------------------
    Summary: Phoenix spark to hbase connector takes long time persist data  
(was: Phoenix spark takes long time persist data to hbase)

> Phoenix spark to hbase connector takes long time persist data
> -------------------------------------------------------------
>
>                 Key: PHOENIX-5410
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5410
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Manohar Chamaraju
>            Priority: Major
>
> While using the phoenix spark connector 1.0.0-SNAPSHOT 
> ([https://github.com/apache/phoenix-connectors/tree/master/phoenix-spark])  
> for hbase found that write was taking really long time.
> On profiling the connector found that 90% of cpu time is consumed in method 
> SparkJdbcUtil.toRow() method. 
> !https://files.slack.com/files-pri/T037D1PV9-FKYGD504A/image.png!
> If i look into code SparkJdbcUtil.toRow() method gets called for every field 
> of a row and RowEncoder(schema).resolveAndBind() object gets created for 
> every iteration because of this lots of encoder objects get created and 
> collected by GC eventually causing CPU cycles and causing performance 
> degradation.
> Moreover SparkJdbcUtil.toRow() is called by PhoenixDataWriter.write() where 
> schema for writer object is same for all rows hence we can optimize the code 
> there by avoiding creating unnecessary objects and gaining good % of 
> performance improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (PHOENIX-5410) Phoenix spark to hbase connector takes long time persist data

Reply via email to