[ https://issues.apache.org/jira/browse/SPARK-20281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972756#comment-15972756 ]
Takeshi Yamamuro commented on SPARK-20281: ------------------------------------------ IIUC they internally use the same value (that is, defaultParallelism) for splits by default. But, I feel this different printing makes users a bit confused. > Table-valued function range in SQL should use the same number of partitions > as spark.range > ------------------------------------------------------------------------------------------ > > Key: SPARK-20281 > URL: https://issues.apache.org/jira/browse/SPARK-20281 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.2.0 > Reporter: Jacek Laskowski > Priority: Minor > > Note the different number of partitions in {{range}} in SQL and as operator. > {code} > scala> spark.range(4).explain > == Physical Plan == > *Range (0, 4, step=1, splits=Some(8)) // <-- note Some(8) > scala> sql("select * from range(4)").explain > == Physical Plan == > *Range (0, 4, step=1, splits=None) // <-- note None > {code} > If I'm not mistaken, the change is to fix {{builtinFunctions}} in > {{ResolveTableValuedFunctions}} (see > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala#L82-L93]) > to use {{sparkContext.defaultParallelism}} as {{SparkSession.range}} (see > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L517]). > Please confirm to work on a fix if and as needed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org