[spark] Git Push Summary
Repository: spark Updated Tags: refs/tags/v2.2.1 [created] e30e2698a - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22688][SQL] Upgrade Janino version to 3.0.8
Repository: spark Updated Branches: refs/heads/master f110a7f88 -> 8ae004b46 [SPARK-22688][SQL] Upgrade Janino version to 3.0.8 ## What changes were proposed in this pull request? This PR upgrade Janino version to 3.0.8. [Janino 3.0.8](https://janino-compiler.github.io/janino/changelog.html) includes an important fix to reduce the number of constant pool entries by using 'sipush' java bytecode. * SIPUSH bytecode is not used for short integer constant [#33](https://github.com/janino-compiler/janino/issues/33). Please see detail in [this discussion thread](https://github.com/apache/spark/pull/19518#issuecomment-346674976). ## How was this patch tested? Existing tests Author: Kazuaki IshizakiCloses #19890 from kiszk/SPARK-22688. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8ae004b4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8ae004b4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8ae004b4 Branch: refs/heads/master Commit: 8ae004b4602266d1f210e4c1564246d590412c06 Parents: f110a7f Author: Kazuaki Ishizaki Authored: Wed Dec 6 16:15:25 2017 -0800 Committer: gatorsmile Committed: Wed Dec 6 16:15:25 2017 -0800 -- dev/deps/spark-deps-hadoop-2.6 | 4 ++-- dev/deps/spark-deps-hadoop-2.7 | 4 ++-- pom.xml| 2 +- .../spark/sql/catalyst/expressions/codegen/CodeGenerator.scala | 6 +++--- .../main/scala/org/apache/spark/sql/execution/SparkPlan.scala | 4 ++-- 5 files changed, 10 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8ae004b4/dev/deps/spark-deps-hadoop-2.6 -- diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6 index 3b5a694..1831f33 100644 --- a/dev/deps/spark-deps-hadoop-2.6 +++ b/dev/deps/spark-deps-hadoop-2.6 @@ -35,7 +35,7 @@ commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.10.jar commons-collections-3.2.2.jar -commons-compiler-3.0.7.jar +commons-compiler-3.0.8.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-crypto-1.0.0.jar @@ -96,7 +96,7 @@ jackson-mapper-asl-1.9.13.jar jackson-module-paranamer-2.7.9.jar jackson-module-scala_2.11-2.6.7.1.jar jackson-xc-1.9.13.jar -janino-3.0.7.jar +janino-3.0.8.jar java-xmlbuilder-1.1.jar javassist-3.18.1-GA.jar javax.annotation-api-1.2.jar http://git-wip-us.apache.org/repos/asf/spark/blob/8ae004b4/dev/deps/spark-deps-hadoop-2.7 -- diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7 index 64136ba..fe14c05 100644 --- a/dev/deps/spark-deps-hadoop-2.7 +++ b/dev/deps/spark-deps-hadoop-2.7 @@ -35,7 +35,7 @@ commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.10.jar commons-collections-3.2.2.jar -commons-compiler-3.0.7.jar +commons-compiler-3.0.8.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-crypto-1.0.0.jar @@ -96,7 +96,7 @@ jackson-mapper-asl-1.9.13.jar jackson-module-paranamer-2.7.9.jar jackson-module-scala_2.11-2.6.7.1.jar jackson-xc-1.9.13.jar -janino-3.0.7.jar +janino-3.0.8.jar java-xmlbuilder-1.1.jar javassist-3.18.1-GA.jar javax.annotation-api-1.2.jar http://git-wip-us.apache.org/repos/asf/spark/blob/8ae004b4/pom.xml -- diff --git a/pom.xml b/pom.xml index 07bca9d..52db79e 100644 --- a/pom.xml +++ b/pom.xml @@ -170,7 +170,7 @@ 3.5 3.2.10 -3.0.7 +3.0.8 2.22.2 2.9.3 3.5.2 http://git-wip-us.apache.org/repos/asf/spark/blob/8ae004b4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala index 670c82e..5c9e604 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala @@ -29,7 +29,7 @@ import scala.util.control.NonFatal import com.google.common.cache.{CacheBuilder, CacheLoader} import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException} import org.codehaus.commons.compiler.CompileException -import org.codehaus.janino.{ByteArrayClassLoader, ClassBodyEvaluator, JaninoRuntimeException, SimpleCompiler}
spark git commit: [SPARK-22693][SQL] CreateNamedStruct and InSet should not use global variables
Repository: spark Updated Branches: refs/heads/master 9948b860a -> f110a7f88 [SPARK-22693][SQL] CreateNamedStruct and InSet should not use global variables ## What changes were proposed in this pull request? CreateNamedStruct and InSet are using a global variable which is not needed. This can generate some unneeded entries in the constant pool. The PR removes the unnecessary mutable states and makes them local variables. ## How was this patch tested? added UT Author: Marco GaidoAuthor: Marco Gaido Closes #19896 from mgaido91/SPARK-22693. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f110a7f8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f110a7f8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f110a7f8 Branch: refs/heads/master Commit: f110a7f884cb09f01a20462038328ddc5662b46f Parents: 9948b86 Author: Marco Gaido Authored: Wed Dec 6 14:12:16 2017 -0800 Committer: gatorsmile Committed: Wed Dec 6 14:12:16 2017 -0800 -- .../expressions/complexTypeCreator.scala| 27 +++- .../sql/catalyst/expressions/predicates.scala | 22 .../catalyst/expressions/ComplexTypeSuite.scala | 7 + .../catalyst/expressions/PredicateSuite.scala | 7 + 4 files changed, 40 insertions(+), 23 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f110a7f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala index 087b210..3dc2ee0 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala @@ -356,22 +356,25 @@ case class CreateNamedStruct(children: Seq[Expression]) extends CreateNamedStruc override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val rowClass = classOf[GenericInternalRow].getName val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"$values = null;") +val valCodes = valExprs.zipWithIndex.map { case (e, i) => + val eval = e.genCode(ctx) + s""" + |${eval.code} + |if (${eval.isNull}) { + | $values[$i] = null; + |} else { + | $values[$i] = ${eval.value}; + |} + """.stripMargin +} val valuesCode = ctx.splitExpressionsWithCurrentInputs( - valExprs.zipWithIndex.map { case (e, i) => -val eval = e.genCode(ctx) -s""" - ${eval.code} - if (${eval.isNull}) { -$values[$i] = null; - } else { -$values[$i] = ${eval.value}; - }""" - }) + expressions = valCodes, + funcName = "createNamedStruct", + extraArguments = "Object[]" -> values :: Nil) ev.copy(code = s""" - |$values = new Object[${valExprs.size}]; + |Object[] $values = new Object[${valExprs.size}]; |$valuesCode |final InternalRow ${ev.value} = new $rowClass($values); |$values = null; http://git-wip-us.apache.org/repos/asf/spark/blob/f110a7f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala index 04e6694..a42dd7e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala @@ -344,17 +344,17 @@ case class InSet(child: Expression, hset: Set[Any]) extends UnaryExpression with } else { "" } -ctx.addMutableState(setName, setTerm, - s"$setTerm = (($InSetName)references[${ctx.references.size - 1}]).getSet();") -ev.copy(code = s""" - ${childGen.code} - boolean ${ev.isNull} = ${childGen.isNull}; - boolean ${ev.value} = false; - if (!${ev.isNull}) { -${ev.value} = $setTerm.contains(${childGen.value}); -$setNull - } - """) +ev.copy(code = + s""" + |${childGen.code} + |${ctx.JAVA_BOOLEAN} ${ev.isNull} = ${childGen.isNull}; + |${ctx.JAVA_BOOLEAN} ${ev.value} = false; + |if (!${ev.isNull}) { +
spark git commit: [SPARK-22516][SQL] Bump up Univocity version to 2.5.9
Repository: spark Updated Branches: refs/heads/master effca9868 -> 9948b860a [SPARK-22516][SQL] Bump up Univocity version to 2.5.9 ## What changes were proposed in this pull request? There was a bug in Univocity Parser that causes the issue in SPARK-22516. This was fixed by upgrading from 2.5.4 to 2.5.9 version of the library : **Executing** ``` spark.read.option("header","true").option("inferSchema", "true").option("multiLine", "true").option("comment", "g").csv("test_file_without_eof_char.csv").show() ``` **Before** ``` ERROR Executor: Exception in task 0.0 in stage 6.0 (TID 6) com.univocity.parsers.common.TextParsingException: java.lang.IllegalArgumentException - Unable to skip 1 lines from line 2. End of input reached ... Internal state when error was thrown: line=3, column=0, record=2, charIndex=31 at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:339) at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:475) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anon$1.next(UnivocityParser.scala:281) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) ``` **After** ``` +---+---+ |column1|column2| +---+---+ |abc|def| +---+---+ ``` ## How was this patch tested? The already existing `CSVSuite.commented lines in CSV data` test was extended to parse the file also in multiline mode. The test input file was modified to also include a comment in the last line. Author: smurakoziCloses #19906 from smurakozi/SPARK-22516. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9948b860 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9948b860 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9948b860 Branch: refs/heads/master Commit: 9948b860aca42bbc6478ddfbc0ff590adb00c2f3 Parents: effca98 Author: smurakozi Authored: Wed Dec 6 13:22:08 2017 -0800 Committer: Marcelo Vanzin Committed: Wed Dec 6 13:22:08 2017 -0800 -- dev/deps/spark-deps-hadoop-2.6 | 2 +- dev/deps/spark-deps-hadoop-2.7 | 2 +- sql/core/pom.xml| 2 +- .../src/test/resources/test-data/comments.csv | 1 + .../execution/datasources/csv/CSVSuite.scala| 23 +++- 5 files changed, 17 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9948b860/dev/deps/spark-deps-hadoop-2.6 -- diff --git a/dev/deps/spark-deps-hadoop-2.6 b/dev/deps/spark-deps-hadoop-2.6 index 2c68b73..3b5a694 100644 --- a/dev/deps/spark-deps-hadoop-2.6 +++ b/dev/deps/spark-deps-hadoop-2.6 @@ -180,7 +180,7 @@ stax-api-1.0.1.jar stream-2.7.0.jar stringtemplate-3.2.1.jar super-csv-2.2.0.jar -univocity-parsers-2.5.4.jar +univocity-parsers-2.5.9.jar validation-api-1.1.0.Final.jar xbean-asm5-shaded-4.4.jar xercesImpl-2.9.1.jar http://git-wip-us.apache.org/repos/asf/spark/blob/9948b860/dev/deps/spark-deps-hadoop-2.7 -- diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7 index 2aaac60..64136ba 100644 --- a/dev/deps/spark-deps-hadoop-2.7 +++ b/dev/deps/spark-deps-hadoop-2.7 @@ -181,7 +181,7 @@ stax-api-1.0.1.jar stream-2.7.0.jar stringtemplate-3.2.1.jar super-csv-2.2.0.jar -univocity-parsers-2.5.4.jar +univocity-parsers-2.5.9.jar validation-api-1.1.0.Final.jar xbean-asm5-shaded-4.4.jar xercesImpl-2.9.1.jar http://git-wip-us.apache.org/repos/asf/spark/blob/9948b860/sql/core/pom.xml -- diff --git a/sql/core/pom.xml b/sql/core/pom.xml index 4db3fea..93010c6 100644 --- a/sql/core/pom.xml +++ b/sql/core/pom.xml @@ -38,7 +38,7 @@ com.univocity univocity-parsers - 2.5.4 + 2.5.9 jar http://git-wip-us.apache.org/repos/asf/spark/blob/9948b860/sql/core/src/test/resources/test-data/comments.csv -- diff --git a/sql/core/src/test/resources/test-data/comments.csv b/sql/core/src/test/resources/test-data/comments.csv index 6275be7..c0ace46 100644 --- a/sql/core/src/test/resources/test-data/comments.csv +++ b/sql/core/src/test/resources/test-data/comments.csv @@ -4,3 +4,4 @@ 6,7,8,9,0,2015-08-21 16:58:01 ~0,9,8,7,6,2015-08-22 17:59:02 1,2,3,4,5,2015-08-23 18:00:42 +~ comment in last line to test SPARK-22516 - do not add empty line at the end of this file! \ No newline at end of file
spark git commit: [SPARK-22720][SS] Make EventTimeWatermark Extend UnaryNode
Repository: spark Updated Branches: refs/heads/master 51066b437 -> effca9868 [SPARK-22720][SS] Make EventTimeWatermark Extend UnaryNode ## What changes were proposed in this pull request? Our Analyzer and Optimizer have multiple rules for `UnaryNode`. After making `EventTimeWatermark` extend `UnaryNode`, we do not need a special handling for `EventTimeWatermark`. ## How was this patch tested? The existing tests Author: gatorsmileCloses #19913 from gatorsmile/eventtimewatermark. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/effca986 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/effca986 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/effca986 Branch: refs/heads/master Commit: effca9868e3feae16c5722c36878b23e616d01a2 Parents: 51066b4 Author: gatorsmile Authored: Wed Dec 6 13:11:38 2017 -0800 Committer: gatorsmile Committed: Wed Dec 6 13:11:38 2017 -0800 -- .../spark/sql/catalyst/plans/logical/EventTimeWatermark.scala| 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/effca986/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala index 06196b5..7a927e1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/EventTimeWatermark.scala @@ -38,7 +38,7 @@ object EventTimeWatermark { case class EventTimeWatermark( eventTime: Attribute, delay: CalendarInterval, -child: LogicalPlan) extends LogicalPlan { +child: LogicalPlan) extends UnaryNode { // Update the metadata on the eventTime column to include the desired delay. override val output: Seq[Attribute] = child.output.map { a => @@ -60,6 +60,4 @@ case class EventTimeWatermark( a } } - - override val children: Seq[LogicalPlan] = child :: Nil } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-14228][CORE][YARN] Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped
Repository: spark Updated Branches: refs/heads/master 4286cba7d -> 51066b437 [SPARK-14228][CORE][YARN] Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped ## What changes were proposed in this pull request? I see the two instances where the exception is occurring. **Instance 1:** ``` 17/11/10 15:49:32 ERROR util.Utils: Uncaught exception in thread driver-revive-thread org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:187) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:521) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(CoarseGrainedSchedulerBackend.scala:125) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(CoarseGrainedSchedulerBackend.scala:125) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anon$1$$anonfun$run$1.apply$mcV$sp(CoarseGrainedSchedulerBackend.scala:125) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1344) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anon$1.run(CoarseGrainedSchedulerBackend.scala:124) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` In CoarseGrainedSchedulerBackend.scala, driver-revive-thread starts with DriverEndpoint.onStart() and keeps sending the ReviveOffers messages periodically till it gets shutdown as part DriverEndpoint.onStop(). There is no proper coordination between the driver-revive-thread(shutdown) and the RpcEndpoint unregister, RpcEndpoint unregister happens first and then driver-revive-thread shuts down as part of DriverEndpoint.onStop(), In-between driver-revive-thread may try to send the ReviveOffers message which is leading to the above exception. To fix this issue, this PR moves the shutting down of driver-revive-thread to CoarseGrainedSchedulerBackend.stop() which executes before the DriverEndpoint unregister. **Instance 2:** ``` 17/11/10 16:31:38 ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Error requesting driver to remove executor 1 for reason Executor for container container_1508535467865_0226_01_02 exited because of a YARN event (e.g., pre-emption) and not because of an error in the running job. org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:516) at org.apache.spark.rpc.RpcEndpointRef.ask(RpcEndpointRef.scala:63) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:269) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` Here YarnDriverEndpoint tries to send remove executor messages after the Yarn scheduler backend service stop, which is leading to the above exception. To avoid the above exception, 1) We may add a condition(which checks whether service has stopped or not) before sending executor remove message 2) Add a warn log message in onFailure case when the service is already stopped In this PR, chosen the 2) option which adds a log message
spark git commit: [SPARK-22710] ConfigBuilder.fallbackConf should trigger onCreate function
Repository: spark Updated Branches: refs/heads/master e98f9647f -> 4286cba7d [SPARK-22710] ConfigBuilder.fallbackConf should trigger onCreate function ## What changes were proposed in this pull request? I was looking at the config code today and found that configs defined using ConfigBuilder.fallbackConf didn't trigger onCreate function. This patch fixes it. This doesn't require backporting since we currently have no configs that use it. ## How was this patch tested? Added a test case for all the config final creator functions in ConfigEntrySuite. Author: Reynold XinCloses #19905 from rxin/SPARK-22710. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4286cba7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4286cba7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4286cba7 Branch: refs/heads/master Commit: 4286cba7dacf4b457fff91da3743ac2518699945 Parents: e98f964 Author: Reynold Xin Authored: Wed Dec 6 10:11:25 2017 -0800 Committer: gatorsmile Committed: Wed Dec 6 10:11:25 2017 -0800 -- .../spark/internal/config/ConfigBuilder.scala | 4 +++- .../internal/config/ConfigEntrySuite.scala | 20 2 files changed, 23 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4286cba7/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala -- diff --git a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala index 8f4c1b6..b0cd711 100644 --- a/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala +++ b/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala @@ -235,7 +235,9 @@ private[spark] case class ConfigBuilder(key: String) { } def fallbackConf[T](fallback: ConfigEntry[T]): ConfigEntry[T] = { -new FallbackConfigEntry(key, _alternatives, _doc, _public, fallback) +val entry = new FallbackConfigEntry(key, _alternatives, _doc, _public, fallback) +_onCreate.foreach(_(entry)) +entry } def regexConf: TypedConfigBuilder[Regex] = { http://git-wip-us.apache.org/repos/asf/spark/blob/4286cba7/core/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala b/core/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala index bf08276..02514dc 100644 --- a/core/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala +++ b/core/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala @@ -288,4 +288,24 @@ class ConfigEntrySuite extends SparkFunSuite { conf.remove(testKey("b")) assert(conf.get(iConf) === 3) } + + test("onCreate") { +var onCreateCalled = false +ConfigBuilder(testKey("oc1")).onCreate(_ => onCreateCalled = true).intConf.createWithDefault(1) +assert(onCreateCalled) + +onCreateCalled = false +ConfigBuilder(testKey("oc2")).onCreate(_ => onCreateCalled = true).intConf.createOptional +assert(onCreateCalled) + +onCreateCalled = false +ConfigBuilder(testKey("oc3")).onCreate(_ => onCreateCalled = true).intConf + .createWithDefaultString("1.0") +assert(onCreateCalled) + +val fallback = ConfigBuilder(testKey("oc4")).intConf.createWithDefault(1) +onCreateCalled = false +ConfigBuilder(testKey("oc5")).onCreate(_ => onCreateCalled = true).fallbackConf(fallback) +assert(onCreateCalled) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-22695][SQL] ScalaUDF should not use global variables
Repository: spark Updated Branches: refs/heads/master 813c0f945 -> e98f9647f [SPARK-22695][SQL] ScalaUDF should not use global variables ## What changes were proposed in this pull request? ScalaUDF is using global variables which are not needed. This can generate some unneeded entries in the constant pool. The PR replaces the unneeded global variables with local variables. ## How was this patch tested? added UT Author: Marco GaidoAuthor: Marco Gaido Closes #19900 from mgaido91/SPARK-22695. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e98f9647 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e98f9647 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e98f9647 Branch: refs/heads/master Commit: e98f9647f44d1071a6b070db070841b8cda6bd7a Parents: 813c0f9 Author: Marco Gaido Authored: Thu Dec 7 00:50:49 2017 +0800 Committer: Wenchen Fan Committed: Thu Dec 7 00:50:49 2017 +0800 -- .../sql/catalyst/expressions/ScalaUDF.scala | 88 ++-- .../catalyst/expressions/ScalaUDFSuite.scala| 6 ++ 2 files changed, 51 insertions(+), 43 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e98f9647/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala index 1798530..4d26d98 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala @@ -982,35 +982,28 @@ case class ScalaUDF( // scalastyle:on line.size.limit - // Generate codes used to convert the arguments to Scala type for user-defined functions - private[this] def genCodeForConverter(ctx: CodegenContext, index: Int): String = { -val converterClassName = classOf[Any => Any].getName -val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" -val expressionClassName = classOf[Expression].getName -val scalaUDFClassName = classOf[ScalaUDF].getName + private val converterClassName = classOf[Any => Any].getName + private val scalaUDFClassName = classOf[ScalaUDF].getName + private val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" + // Generate codes used to convert the arguments to Scala type for user-defined functions + private[this] def genCodeForConverter(ctx: CodegenContext, index: Int): (String, String) = { val converterTerm = ctx.freshName("converter") val expressionIdx = ctx.references.size - 1 -ctx.addMutableState(converterClassName, converterTerm, - s"$converterTerm = ($converterClassName)$typeConvertersClassName" + - s".createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)" + - s"references[$expressionIdx]).getChildren().apply($index))).dataType());") -converterTerm +(converterTerm, + s"$converterClassName $converterTerm = ($converterClassName)$typeConvertersClassName" + +s".createToScalaConverter(((Expression)((($scalaUDFClassName)" + + s"references[$expressionIdx]).getChildren().apply($index))).dataType());") } override def doGenCode( ctx: CodegenContext, ev: ExprCode): ExprCode = { +val scalaUDF = ctx.freshName("scalaUDF") +val scalaUDFRef = ctx.addReferenceMinorObj(this, scalaUDFClassName) -val scalaUDF = ctx.addReferenceObj("scalaUDF", this) -val converterClassName = classOf[Any => Any].getName -val typeConvertersClassName = CatalystTypeConverters.getClass.getName + ".MODULE$" - -// Generate codes used to convert the returned value of user-defined functions to Catalyst type +// Object to convert the returned value of user-defined functions to Catalyst type val catalystConverterTerm = ctx.freshName("catalystConverter") -ctx.addMutableState(converterClassName, catalystConverterTerm, - s"$catalystConverterTerm = ($converterClassName)$typeConvertersClassName" + -s".createToCatalystConverter($scalaUDF.dataType());") val resultTerm = ctx.freshName("result") @@ -1022,8 +1015,6 @@ case class ScalaUDF( val funcClassName = s"scala.Function${children.size}" val funcTerm = ctx.freshName("udf") -ctx.addMutableState(funcClassName, funcTerm, - s"$funcTerm = ($funcClassName)$scalaUDF.userDefinedFunc();") // codegen for children expressions val evals = children.map(_.genCode(ctx)) @@ -1033,34 +1024,45 @@
spark git commit: [SPARK-22704][SQL] Least and Greatest use less global variables
Repository: spark Updated Branches: refs/heads/master 6f41c593b -> 813c0f945 [SPARK-22704][SQL] Least and Greatest use less global variables ## What changes were proposed in this pull request? This PR accomplishes the following two items. 1. Reduce # of global variables from two to one 2. Make lifetime of global variable local within an operation Item 1. reduces # of constant pool entries in a Java class. Item 2. ensures that an variable is not passed to arguments in a method split by `CodegenContext.splitExpressions()`, which is addressed by #19865. ## How was this patch tested? Added new test into `ArithmeticExpressionSuite` Author: Kazuaki IshizakiCloses #19899 from kiszk/SPARK-22704. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/813c0f94 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/813c0f94 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/813c0f94 Branch: refs/heads/master Commit: 813c0f945d7f03800975eaed26b86a1f30e513c9 Parents: 6f41c59 Author: Kazuaki Ishizaki Authored: Thu Dec 7 00:45:51 2017 +0800 Committer: Wenchen Fan Committed: Thu Dec 7 00:45:51 2017 +0800 -- .../sql/catalyst/expressions/arithmetic.scala | 94 +--- .../expressions/ArithmeticExpressionSuite.scala | 11 +++ 2 files changed, 73 insertions(+), 32 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/813c0f94/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index 739bd13..1893eec 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -602,23 +602,38 @@ case class Least(children: Seq[Expression]) extends Expression { override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evalChildren = children.map(_.genCode(ctx)) -ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) -ctx.addMutableState(ctx.javaType(dataType), ev.value) -def updateEval(eval: ExprCode): String = { +val tmpIsNull = ctx.freshName("leastTmpIsNull") +ctx.addMutableState(ctx.JAVA_BOOLEAN, tmpIsNull) +val evals = evalChildren.map(eval => s""" -${eval.code} -if (!${eval.isNull} && (${ev.isNull} || - ${ctx.genGreater(dataType, ev.value, eval.value)})) { - ${ev.isNull} = false; - ${ev.value} = ${eval.value}; -} - """ -} -val codes = ctx.splitExpressionsWithCurrentInputs(evalChildren.map(updateEval)) -ev.copy(code = s""" - ${ev.isNull} = true; - ${ev.value} = ${ctx.defaultValue(dataType)}; - $codes""") + |${eval.code} + |if (!${eval.isNull} && ($tmpIsNull || + | ${ctx.genGreater(dataType, ev.value, eval.value)})) { + | $tmpIsNull = false; + | ${ev.value} = ${eval.value}; + |} + """.stripMargin +) + +val resultType = ctx.javaType(dataType) +val codes = ctx.splitExpressionsWithCurrentInputs( + expressions = evals, + funcName = "least", + extraArguments = Seq(resultType -> ev.value), + returnType = resultType, + makeSplitFunction = body => +s""" + |$body + |return ${ev.value}; +""".stripMargin, + foldFunctions = _.map(funcCall => s"${ev.value} = $funcCall;").mkString("\n")) +ev.copy(code = + s""" + |$tmpIsNull = true; + |${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)}; + |$codes + |final boolean ${ev.isNull} = $tmpIsNull; + """.stripMargin) } } @@ -668,22 +683,37 @@ case class Greatest(children: Seq[Expression]) extends Expression { override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { val evalChildren = children.map(_.genCode(ctx)) -ctx.addMutableState(ctx.JAVA_BOOLEAN, ev.isNull) -ctx.addMutableState(ctx.javaType(dataType), ev.value) -def updateEval(eval: ExprCode): String = { +val tmpIsNull = ctx.freshName("greatestTmpIsNull") +ctx.addMutableState(ctx.JAVA_BOOLEAN, tmpIsNull) +val evals = evalChildren.map(eval => s""" -${eval.code} -if (!${eval.isNull} && (${ev.isNull} || - ${ctx.genGreater(dataType, eval.value, ev.value)})) { - ${ev.isNull} = false; - ${ev.value} = ${eval.value}; -} - """ -} -val codes =
spark git commit: [SPARK-22690][ML] Imputer inherit HasOutputCols
Repository: spark Updated Branches: refs/heads/master fb6a92275 -> 6f41c593b [SPARK-22690][ML] Imputer inherit HasOutputCols ## What changes were proposed in this pull request? make `Imputer` inherit `HasOutputCols` ## How was this patch tested? existing tests Author: Zheng RuiFengCloses #19889 from zhengruifeng/using_HasOutputCols. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f41c593 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f41c593 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f41c593 Branch: refs/heads/master Commit: 6f41c593bbefa946d13b62ecf4e85074fd3c1541 Parents: fb6a922 Author: Zheng RuiFeng Authored: Wed Dec 6 08:27:17 2017 -0800 Committer: Holden Karau Committed: Wed Dec 6 08:27:17 2017 -0800 -- .../scala/org/apache/spark/ml/feature/Imputer.scala | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6f41c593/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala index 4663f16..730ee9f 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala @@ -23,7 +23,7 @@ import org.apache.spark.SparkException import org.apache.spark.annotation.{Experimental, Since} import org.apache.spark.ml.{Estimator, Model} import org.apache.spark.ml.param._ -import org.apache.spark.ml.param.shared.HasInputCols +import org.apache.spark.ml.param.shared.{HasInputCols, HasOutputCols} import org.apache.spark.ml.util._ import org.apache.spark.sql.{DataFrame, Dataset, Row} import org.apache.spark.sql.functions._ @@ -32,7 +32,7 @@ import org.apache.spark.sql.types._ /** * Params for [[Imputer]] and [[ImputerModel]]. */ -private[feature] trait ImputerParams extends Params with HasInputCols { +private[feature] trait ImputerParams extends Params with HasInputCols with HasOutputCols { /** * The imputation strategy. Currently only "mean" and "median" are supported. @@ -63,16 +63,6 @@ private[feature] trait ImputerParams extends Params with HasInputCols { /** @group getParam */ def getMissingValue: Double = $(missingValue) - /** - * Param for output column names. - * @group param - */ - final val outputCols: StringArrayParam = new StringArrayParam(this, "outputCols", -"output column names") - - /** @group getParam */ - final def getOutputCols: Array[String] = $(outputCols) - /** Validates and transforms the input schema. */ protected def validateAndTransformSchema(schema: StructType): StructType = { require($(inputCols).length == $(inputCols).distinct.length, s"inputCols contains" + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception message
Repository: spark Updated Branches: refs/heads/master 00d176d2f -> fb6a92275 [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception message ## What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/19871 to improve an exception message. ## How was this patch tested? Pass the Jenkins. Author: Dongjoon HyunCloses #19903 from dongjoon-hyun/orc_exception. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fb6a9227 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fb6a9227 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fb6a9227 Branch: refs/heads/master Commit: fb6a9227516f893815fb0b6d26b578e21badd664 Parents: 00d176d Author: Dongjoon Hyun Authored: Wed Dec 6 20:20:20 2017 +0900 Committer: hyukjinkwon Committed: Wed Dec 6 20:20:20 2017 +0900 -- .../apache/spark/sql/execution/datasources/DataSource.scala | 8 .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fb6a9227/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala index 5f12d5f..b676672 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala @@ -598,11 +598,11 @@ object DataSource extends Logging { // Found the data source using fully qualified path dataSource case Failure(error) => -if (provider1.toLowerCase(Locale.ROOT) == "orc" || - provider1.startsWith("org.apache.spark.sql.hive.orc")) { +if (provider1.startsWith("org.apache.spark.sql.hive.orc")) { throw new AnalysisException( -"Hive-based ORC data source must be used with Hive support enabled. " + -"Please use native ORC data source instead") +"Hive built-in ORC data source must be used with Hive support enabled. " + +"Please use the native ORC data source by setting 'spark.sql.orc.impl' to " + +"'native'") } else if (provider1.toLowerCase(Locale.ROOT) == "avro" || provider1 == "com.databricks.spark.avro") { throw new AnalysisException( http://git-wip-us.apache.org/repos/asf/spark/blob/fb6a9227/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 86bd9b9..8be 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -1666,7 +1666,7 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { e = intercept[AnalysisException] { sql(s"select id from `org.apache.spark.sql.hive.orc`.`file_path`") } -assert(e.message.contains("Hive-based ORC data source must be used with Hive support")) +assert(e.message.contains("Hive built-in ORC data source must be used with Hive support")) e = intercept[AnalysisException] { sql(s"select id from `com.databricks.spark.avro`.`file_path`") @@ -2790,7 +2790,7 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { val e = intercept[AnalysisException] { sql("CREATE TABLE spark_20728(a INT) USING ORC") } - assert(e.message.contains("Hive-based ORC data source must be used with Hive support")) + assert(e.message.contains("Hive built-in ORC data source must be used with Hive support")) } withSQLConf(SQLConf.ORC_IMPLEMENTATION.key -> "native") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org