[ https://issues.apache.org/jira/browse/SPARK-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin resolved SPARK-15680. --------------------------------- Resolution: Fixed Fix Version/s: 2.0.0 > Disable comments in generated code in order to avoid performance issues > ----------------------------------------------------------------------- > > Key: SPARK-15680 > URL: https://issues.apache.org/jira/browse/SPARK-15680 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Josh Rosen > Assignee: Josh Rosen > Fix For: 2.0.0 > > Attachments: pasted_image_at_2016_05_29_03_02_pm.png, > pasted_image_at_2016_05_29_03_03_pm.png > > > In benchmarks involving tables with very wide and complex schemas (thousands > of columns, deep nesting), I noticed that significant amounts of time (order > of tens of seconds per task) were being spent generating comments during the > code generation phase. > The root cause of the performance problem stems from the fact that calling > {{toString()}} on a complex expression can involve thousands of string > concatenations, resulting in huge amounts (tens of gigabytes) of character > array allocation and copying (see attached profiler screenshots) > In the long term, we can avoid this problem by passing StringBuilders down > the tree and using them to accumulate output. In the short term, however, I > think that we should just disable comments in the generated code by default > since very long comments are typically not useful debugging aids (since > they're truncated for display anyways). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org