[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531058#comment-16531058
 ] 

Takeshi Yamamuro commented on SPARK-24727:
--

Your query in streaming changes every calculation invoked?

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread ant_nebula (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531107#comment-16531107
 ] 

ant_nebula commented on SPARK-24727:


NO.Spark would do DAG schedule for each streaming batchDuration job.

If my one streaming batchDuration jobs completely fill the cache 100, then the 
overfill code would do janino compile every streaming batchDuration job.

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531144#comment-16531144
 ] 

Takeshi Yamamuro commented on SPARK-24727:
--

I see. It makes some sense to me. Any reason to hard-code the value? cc: 
[~smilegator] [~cloud_fan]

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531224#comment-16531224
 ] 

Wenchen Fan commented on SPARK-24727:
-

it's because it was hard to access SQLConf at executor side. I think we can do 
it now.

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531225#comment-16531225
 ] 

Wenchen Fan commented on SPARK-24727:
-

BTW this  needs to be a static SQL. the CodeGenerator object is per JVM.

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24727) The cache 100 in CodeGenerator is too small for streaming

2018-07-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531376#comment-16531376
 ] 

Apache Spark commented on SPARK-24727:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/21705

> The cache 100 in CodeGenerator is too small for streaming
> -
>
> Key: SPARK-24727
> URL: https://issues.apache.org/jira/browse/SPARK-24727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: ant_nebula
>Priority: Major
>
> {code:java}
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator 
> private val cache = CacheBuilder.newBuilder().maximumSize(100).build{code}
> The cache 100 in CodeGenerator is too small for realtime streaming 
> calculation, although is ok for offline calculation. Because realtime 
> streaming calculation is mostly more complex in one driver, and performance 
> sensitive.
> I suggest spark support configging for user with default 100, such as 
> spark.codegen.cache=1000
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org