[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming
[ https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531058#comment-16531058 ] Takeshi Yamamuro commented on SPARK-24727: -- Your query in streaming changes every calculation invoked? > The cache 100 in CodeGenerator is too small for streaming > - > > Key: SPARK-24727 > URL: https://issues.apache.org/jira/browse/SPARK-24727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: ant_nebula >Priority: Major > > {code:java} > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator > private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code} > The cache 100 in CodeGenerator is too small for realtime streaming > calculation, although is ok for offline calculation. Because realtime > streaming calculation is mostly more complex in one driver, and performance > sensitive. > I suggest spark support configging for user with default 100, such as > spark.codegen.cache=1000 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming
[ https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531107#comment-16531107 ] ant_nebula commented on SPARK-24727: NO.Spark would do DAG schedule for each streaming batchDuration job. If my one streaming batchDuration jobs completely fill the cache 100, then the overfill code would do janino compile every streaming batchDuration job. > The cache 100 in CodeGenerator is too small for streaming > - > > Key: SPARK-24727 > URL: https://issues.apache.org/jira/browse/SPARK-24727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: ant_nebula >Priority: Major > > {code:java} > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator > private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code} > The cache 100 in CodeGenerator is too small for realtime streaming > calculation, although is ok for offline calculation. Because realtime > streaming calculation is mostly more complex in one driver, and performance > sensitive. > I suggest spark support configging for user with default 100, such as > spark.codegen.cache=1000 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming
[ https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531144#comment-16531144 ] Takeshi Yamamuro commented on SPARK-24727: -- I see. It makes some sense to me. Any reason to hard-code the value? cc: [~smilegator] [~cloud_fan] > The cache 100 in CodeGenerator is too small for streaming > - > > Key: SPARK-24727 > URL: https://issues.apache.org/jira/browse/SPARK-24727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: ant_nebula >Priority: Major > > {code:java} > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator > private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code} > The cache 100 in CodeGenerator is too small for realtime streaming > calculation, although is ok for offline calculation. Because realtime > streaming calculation is mostly more complex in one driver, and performance > sensitive. > I suggest spark support configging for user with default 100, such as > spark.codegen.cache=1000 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming
[ https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531224#comment-16531224 ] Wenchen Fan commented on SPARK-24727: - it's because it was hard to access SQLConf at executor side. I think we can do it now. > The cache 100 in CodeGenerator is too small for streaming > - > > Key: SPARK-24727 > URL: https://issues.apache.org/jira/browse/SPARK-24727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: ant_nebula >Priority: Major > > {code:java} > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator > private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code} > The cache 100 in CodeGenerator is too small for realtime streaming > calculation, although is ok for offline calculation. Because realtime > streaming calculation is mostly more complex in one driver, and performance > sensitive. > I suggest spark support configging for user with default 100, such as > spark.codegen.cache=1000 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming
[ https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531225#comment-16531225 ] Wenchen Fan commented on SPARK-24727: - BTW this needs to be a static SQL. the CodeGenerator object is per JVM. > The cache 100 in CodeGenerator is too small for streaming > - > > Key: SPARK-24727 > URL: https://issues.apache.org/jira/browse/SPARK-24727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: ant_nebula >Priority: Major > > {code:java} > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator > private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code} > The cache 100 in CodeGenerator is too small for realtime streaming > calculation, although is ok for offline calculation. Because realtime > streaming calculation is mostly more complex in one driver, and performance > sensitive. > I suggest spark support configging for user with default 100, such as > spark.codegen.cache=1000 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming
[ https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531376#comment-16531376 ] Apache Spark commented on SPARK-24727: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/21705 > The cache 100 in CodeGenerator is too small for streaming > - > > Key: SPARK-24727 > URL: https://issues.apache.org/jira/browse/SPARK-24727 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: ant_nebula >Priority: Major > > {code:java} > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator > private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code} > The cache 100 in CodeGenerator is too small for realtime streaming > calculation, although is ok for offline calculation. Because realtime > streaming calculation is mostly more complex in one driver, and performance > sensitive. > I suggest spark support configging for user with default 100, such as > spark.codegen.cache=1000 > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org