[jira] [Created] (SPARK-18171) Show correct framework address in mesos master web ui when the advertised address is used
Shuai Lin created SPARK-18171: - Summary: Show correct framework address in mesos master web ui when the advertised address is used Key: SPARK-18171 URL: https://issues.apache.org/jira/browse/SPARK-18171 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Shuai Lin Priority: Minor In INF-4563 we added the support for the driver to advertise a different hostname/ip ({{spark.driver.host}} to the executors other than the hostname/ip the driver actually binds to ({{spark.driver.bindAddress}}). But in the mesos webui's frameworks page, it still shows the driver's binds hostname/ip (though the web ui link is correct). We should fix it to make them consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18171) Show correct framework address in mesos master web ui when the advertised address is used
[ https://issues.apache.org/jira/browse/SPARK-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Lin updated SPARK-18171: -- Description: In [[SPARK-4563]] we added the support for the driver to advertise a different hostname/ip ({{spark.driver.host}} to the executors other than the hostname/ip the driver actually binds to ({{spark.driver.bindAddress}}). But in the mesos webui's frameworks page, it still shows the driver's binds hostname/ip (though the web ui link is correct). We should fix it to make them consistent. (was: In INF-4563 we added the support for the driver to advertise a different hostname/ip ({{spark.driver.host}} to the executors other than the hostname/ip the driver actually binds to ({{spark.driver.bindAddress}}). But in the mesos webui's frameworks page, it still shows the driver's binds hostname/ip (though the web ui link is correct). We should fix it to make them consistent.) > Show correct framework address in mesos master web ui when the advertised > address is used > - > > Key: SPARK-18171 > URL: https://issues.apache.org/jira/browse/SPARK-18171 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Shuai Lin >Priority: Minor > > In [[SPARK-4563]] we added the support for the driver to advertise a > different hostname/ip ({{spark.driver.host}} to the executors other than the > hostname/ip the driver actually binds to ({{spark.driver.bindAddress}}). But > in the mesos webui's frameworks page, it still shows the driver's binds > hostname/ip (though the web ui link is correct). We should fix it to make > them consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18171) Show correct framework address in mesos master web ui when the advertised address is used
[ https://issues.apache.org/jira/browse/SPARK-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15619520#comment-15619520 ] Apache Spark commented on SPARK-18171: -- User 'lins05' has created a pull request for this issue: https://github.com/apache/spark/pull/15684 > Show correct framework address in mesos master web ui when the advertised > address is used > - > > Key: SPARK-18171 > URL: https://issues.apache.org/jira/browse/SPARK-18171 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Shuai Lin >Priority: Minor > > In [[SPARK-4563]] we added the support for the driver to advertise a > different hostname/ip ({{spark.driver.host}} to the executors other than the > hostname/ip the driver actually binds to ({{spark.driver.bindAddress}}). But > in the mesos webui's frameworks page, it still shows the driver's binds > hostname/ip (though the web ui link is correct). We should fix it to make > them consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18171) Show correct framework address in mesos master web ui when the advertised address is used
[ https://issues.apache.org/jira/browse/SPARK-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18171: Assignee: (was: Apache Spark) > Show correct framework address in mesos master web ui when the advertised > address is used > - > > Key: SPARK-18171 > URL: https://issues.apache.org/jira/browse/SPARK-18171 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Shuai Lin >Priority: Minor > > In [[SPARK-4563]] we added the support for the driver to advertise a > different hostname/ip ({{spark.driver.host}} to the executors other than the > hostname/ip the driver actually binds to ({{spark.driver.bindAddress}}). But > in the mesos webui's frameworks page, it still shows the driver's binds > hostname/ip (though the web ui link is correct). We should fix it to make > them consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18171) Show correct framework address in mesos master web ui when the advertised address is used
[ https://issues.apache.org/jira/browse/SPARK-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18171: Assignee: Apache Spark > Show correct framework address in mesos master web ui when the advertised > address is used > - > > Key: SPARK-18171 > URL: https://issues.apache.org/jira/browse/SPARK-18171 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Shuai Lin >Assignee: Apache Spark >Priority: Minor > > In [[SPARK-4563]] we added the support for the driver to advertise a > different hostname/ip ({{spark.driver.host}} to the executors other than the > hostname/ip the driver actually binds to ({{spark.driver.bindAddress}}). But > in the mesos webui's frameworks page, it still shows the driver's binds > hostname/ip (though the web ui link is correct). We should fix it to make > them consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18162) SparkEnv.get.metricsSystem in spark-shell results in error: missing or invalid dependency detected while loading class file 'MetricsSystem.class'
[ https://issues.apache.org/jira/browse/SPARK-18162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18162. --- Resolution: Not A Problem > SparkEnv.get.metricsSystem in spark-shell results in error: missing or > invalid dependency detected while loading class file 'MetricsSystem.class' > - > > Key: SPARK-18162 > URL: https://issues.apache.org/jira/browse/SPARK-18162 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Jacek Laskowski >Priority: Minor > > This is with the build today from master. > {code} > $ ./bin/spark-shell --version > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.1.0-SNAPSHOT > /_/ > Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_102 > Branch master > Compiled by user jacek on 2016-10-28T04:05:11Z > Revision ab5f938bc7c3c9b137d63e479fced2b7e9c9d75b > Url https://github.com/apache/spark.git > Type --help for more information. > $ ./bin/spark-shell > scala> SparkEnv.get.metricsSystem > error: missing or invalid dependency detected while loading class file > 'MetricsSystem.class'. > Could not access term eclipse in package org, > because it (or its dependencies) are missing. Check your build definition for > missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see > the problematic classpath.) > A full rebuild may help if 'MetricsSystem.class' was compiled against an > incompatible version of org. > error: missing or invalid dependency detected while loading class file > 'MetricsSystem.class'. > Could not access term jetty in value org.eclipse, > because it (or its dependencies) are missing. Check your build definition for > missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see > the problematic classpath.) > A full rebuild may help if 'MetricsSystem.class' was compiled against an > incompatible version of org.eclipse. > scala> spark.version > res3: String = 2.1.0-SNAPSHOT > {code} > I could not find any information about how to set it up in the [official > documentation|http://spark.apache.org/docs/latest/monitoring.html#metrics]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18170) Confusing error message when using rangeBetween without specifying an "orderBy"
[ https://issues.apache.org/jira/browse/SPARK-18170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18170: -- Issue Type: Improvement (was: Bug) > Confusing error message when using rangeBetween without specifying an > "orderBy" > --- > > Key: SPARK-18170 > URL: https://issues.apache.org/jira/browse/SPARK-18170 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Weiluo Ren >Priority: Minor > > {code} > spark.range(1,3).select(sum('id) over Window.rangeBetween(0,1)).show > {code} > throws runtime exception: > {code} > Non-Zero range offsets are not supported for windows with multiple order > expressions. > {code} > which is confusing in this case because we don't have any order expression > here. > How about add a check on > {code} > orderSpec.isEmpty > {code} > at > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala#L141 > and throw an exception saying "no order expressions is specified"? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3261) KMeans clusterer can return duplicate cluster centers
[ https://issues.apache.org/jira/browse/SPARK-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-3261. -- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15450 [https://github.com/apache/spark/pull/15450] > KMeans clusterer can return duplicate cluster centers > - > > Key: SPARK-3261 > URL: https://issues.apache.org/jira/browse/SPARK-3261 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.0.2 >Reporter: Derrick Burns >Priority: Minor > Labels: clustering > Fix For: 2.1.0 > > > This is a bad design choice. I think that it is preferable to produce no > duplicate cluster centers. So instead of forcing the number of clusters to be > K, return at most K clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18172) AnalysisException in first/last during aggregation
Emlyn Corrin created SPARK-18172: Summary: AnalysisException in first/last during aggregation Key: SPARK-18172 URL: https://issues.apache.org/jira/browse/SPARK-18172 Project: Spark Issue Type: Bug Affects Versions: 2.0.1 Reporter: Emlyn Corrin Since Spark 2.0.1, the following pyspark snippet fails with {{AnalysisException: The second argument of First should be a boolean literal}} (but it's not restricted to Python, similar code with in Java fails in the same way). It worked in Spark 2.0.0, so I believe it may be related to the fix for SPARK-16648. {code} from pyspark.sql import functions as F ds = spark.createDataFrame(sc.parallelize([[1, 1, 2], [1, 2, 3], [1, 3, 4]])) ds.groupBy(ds._1).agg(F.first(ds._2), F.countDistinct(ds._2), F.countDistinct(ds._2, ds._3)).show() {code} It works if any of the three arguments to {{.agg}} is removed. The stack trace is: {code} Py4JJavaError Traceback (most recent call last) in () > 1 ds.groupBy(ds._1).agg(F.first(ds._2),F.countDistinct(ds._2),F.countDistinct(ds._2, ds._3)).show() /usr/local/Cellar/apache-spark/2.0.1/libexec/python/pyspark/sql/dataframe.py in show(self, n, truncate) 285 +---+-+ 286 """ --> 287 print(self._jdf.showString(n, truncate)) 288 289 def __repr__(self): /usr/local/Cellar/apache-spark/2.0.1/libexec/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py in __call__(self, *args) 1131 answer = self.gateway_client.send_command(command) 1132 return_value = get_return_value( -> 1133 answer, self.gateway_client, self.target_id, self.name) 1134 1135 for temp_arg in temp_args: /usr/local/Cellar/apache-spark/2.0.1/libexec/python/pyspark/sql/utils.py in deco(*a, **kw) 61 def deco(*a, **kw): 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: 65 s = e.java_exception.toString() /usr/local/Cellar/apache-spark/2.0.1/libexec/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 317 raise Py4JJavaError( 318 "An error occurred while calling {0}{1}{2}.\n". --> 319 format(target_id, ".", name), value) 320 else: 321 raise Py4JError( Py4JJavaError: An error occurred while calling o76.showString. : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, tree: first(_2#1L)() at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:387) at org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:256) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$.org$apache$spark$sql$catalyst$optimizer$RewriteDistinctAggregates$$patchAggregateFunctionChildren$1(RewriteDistinctAggregates.scala:140) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$$anonfun$16.apply(RewriteDistinctAggregates.scala:182) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$$anonfun$16.apply(RewriteDistinctAggregates.scala:180) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$.rewrite(RewriteDistinctAggregates.scala:180) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$$anonfun$apply$1.applyOrElse(RewriteDistinctAggregates.scala:105) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$$anonfun$apply$1.applyOrElse(RewriteDistinctAggregates.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfu
[jira] [Comment Edited] (SPARK-16648) LAST_VALUE(FALSE) OVER () throws IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/SPARK-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613656#comment-15613656 ] Emlyn Corrin edited comment on SPARK-16648 at 10/30/16 10:07 AM: - Edit: I've opened a new issue for this at SPARK-18172. Since Spark 2.0.1, the following pyspark snippet fails (I believe it worked under 2.0.0, so this issue seems like the most likely cause of change in behaviour): {code} from pyspark.sql import functions as F ds = spark.createDataFrame(sc.parallelize([[1, 1, 2], [1, 2, 3], [1, 3, 4]])) ds.groupBy(ds._1).agg(F.first(ds._2), F.countDistinct(ds._2), F.countDistinct(ds._2, ds._3)).show() {code} It works if any of the three arguments to {{.agg}} is removed. The stack trace is: {code} Py4JJavaError Traceback (most recent call last) in () > 1 ds.groupBy(ds._1).agg(F.first(ds._2),F.countDistinct(ds._2),F.countDistinct(ds._2, ds._3)).show() /usr/local/Cellar/apache-spark/2.0.1/libexec/python/pyspark/sql/dataframe.py in show(self, n, truncate) 285 +---+-+ 286 """ --> 287 print(self._jdf.showString(n, truncate)) 288 289 def __repr__(self): /usr/local/Cellar/apache-spark/2.0.1/libexec/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py in __call__(self, *args) 1131 answer = self.gateway_client.send_command(command) 1132 return_value = get_return_value( -> 1133 answer, self.gateway_client, self.target_id, self.name) 1134 1135 for temp_arg in temp_args: /usr/local/Cellar/apache-spark/2.0.1/libexec/python/pyspark/sql/utils.py in deco(*a, **kw) 61 def deco(*a, **kw): 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e: 65 s = e.java_exception.toString() /usr/local/Cellar/apache-spark/2.0.1/libexec/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 317 raise Py4JJavaError( 318 "An error occurred while calling {0}{1}{2}.\n". --> 319 format(target_id, ".", name), value) 320 else: 321 raise Py4JError( Py4JJavaError: An error occurred while calling o76.showString. : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, tree: first(_2#1L)() at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:387) at org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:256) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$.org$apache$spark$sql$catalyst$optimizer$RewriteDistinctAggregates$$patchAggregateFunctionChildren$1(RewriteDistinctAggregates.scala:140) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$$anonfun$16.apply(RewriteDistinctAggregates.scala:182) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$$anonfun$16.apply(RewriteDistinctAggregates.scala:180) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$.rewrite(RewriteDistinctAggregates.scala:180) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$$anonfun$apply$1.applyOrElse(RewriteDistinctAggregates.scala:105) at org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates$$anonfun$apply$1.applyOrElse(RewriteDistinctAggregates.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql
[jira] [Updated] (SPARK-18146) Avoid using Union to chain together create table and repair partition commands
[ https://issues.apache.org/jira/browse/SPARK-18146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-18146: Assignee: Eric Liang > Avoid using Union to chain together create table and repair partition commands > -- > > Key: SPARK-18146 > URL: https://issues.apache.org/jira/browse/SPARK-18146 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Minor > > The behavior of union is not well defined here. We should add an internal > command to execute these commands sequentially. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18146) Avoid using Union to chain together create table and repair partition commands
[ https://issues.apache.org/jira/browse/SPARK-18146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-18146. - Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15665 [https://github.com/apache/spark/pull/15665] > Avoid using Union to chain together create table and repair partition commands > -- > > Key: SPARK-18146 > URL: https://issues.apache.org/jira/browse/SPARK-18146 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Minor > Fix For: 2.1.0 > > > The behavior of union is not well defined here. We should add an internal > command to execute these commands sequentially. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620225#comment-15620225 ] Harish commented on SPARK-16522: I am getting same error in spark 2.0.2 snapshot. Standalone submission. py4j.protocol.Py4JJavaError: An error occurred while calling o37785.count. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.consume(BroadcastHashJoinExec.scala:38) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:232) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.FilterExec.consume(basicPhysicalOperators.scala:79) at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:194) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.FilterExec.doProduce(basicPhysicalOperators.scala:113) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.FilterExec.produce(basicPhysicalOperators.scala:79) at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) at org.apache
[jira] [Resolved] (SPARK-18043) Java example for Broadcasting
[ https://issues.apache.org/jira/browse/SPARK-18043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18043. --- Resolution: Not A Problem > Java example for Broadcasting > - > > Key: SPARK-18043 > URL: https://issues.apache.org/jira/browse/SPARK-18043 > Project: Spark > Issue Type: Improvement > Components: Examples >Reporter: Akash Sethi >Priority: Minor > Attachments: JavaBroadcastTest.java > > Original Estimate: 1h > Remaining Estimate: 1h > > I have created a java example for Broadcasting similar to as it is in Scala i > would like to contribute the code for the same -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18173) data source tables should support truncating partition
Wenchen Fan created SPARK-18173: --- Summary: data source tables should support truncating partition Key: SPARK-18173 URL: https://issues.apache.org/jira/browse/SPARK-18173 Project: Spark Issue Type: New Feature Components: SQL Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18173) data source tables should support truncating partition
[ https://issues.apache.org/jira/browse/SPARK-18173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18173: Assignee: Apache Spark (was: Wenchen Fan) > data source tables should support truncating partition > -- > > Key: SPARK-18173 > URL: https://issues.apache.org/jira/browse/SPARK-18173 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18173) data source tables should support truncating partition
[ https://issues.apache.org/jira/browse/SPARK-18173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18173: Assignee: Wenchen Fan (was: Apache Spark) > data source tables should support truncating partition > -- > > Key: SPARK-18173 > URL: https://issues.apache.org/jira/browse/SPARK-18173 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18173) data source tables should support truncating partition
[ https://issues.apache.org/jira/browse/SPARK-18173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620290#comment-15620290 ] Apache Spark commented on SPARK-18173: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/15688 > data source tables should support truncating partition > -- > > Key: SPARK-18173 > URL: https://issues.apache.org/jira/browse/SPARK-18173 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620392#comment-15620392 ] Saikat Kanjilal commented on SPARK-9487: PR attached here: https://github.com/apache/spark/pull/15689 I only changed everything to local[4] in core and ran unit tests, all unit tests ran sucessfully This is a WIP so once have folks review this initial request and signed off I will start changing the python pieces [~holdenk][~sowen] let me know next steps > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620402#comment-15620402 ] Sean Owen commented on SPARK-9487: -- Again have a look at https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark -- you need to update your PR title to link it. > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18170) Confusing error message when using rangeBetween without specifying an "orderBy"
[ https://issues.apache.org/jira/browse/SPARK-18170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18170: -- Target Version/s: (was: 2.0.1) > Confusing error message when using rangeBetween without specifying an > "orderBy" > --- > > Key: SPARK-18170 > URL: https://issues.apache.org/jira/browse/SPARK-18170 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Weiluo Ren >Priority: Minor > > {code} > spark.range(1,3).select(sum('id) over Window.rangeBetween(0,1)).show > {code} > throws runtime exception: > {code} > Non-Zero range offsets are not supported for windows with multiple order > expressions. > {code} > which is confusing in this case because we don't have any order expression > here. > How about add a check on > {code} > orderSpec.isEmpty > {code} > at > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala#L141 > and throw an exception saying "no order expressions is specified"? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-3261) KMeans clusterer can return duplicate cluster centers
[ https://issues.apache.org/jira/browse/SPARK-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-3261: Assignee: Sean Owen > KMeans clusterer can return duplicate cluster centers > - > > Key: SPARK-3261 > URL: https://issues.apache.org/jira/browse/SPARK-3261 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.0.2 >Reporter: Derrick Burns >Assignee: Sean Owen >Priority: Minor > Labels: clustering > Fix For: 2.1.0 > > > This is a bad design choice. I think that it is preferable to produce no > duplicate cluster centers. So instead of forcing the number of clusters to be > K, return at most K clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18174) Avoid Implicit Type Cast in Arguments of Expressions Extending String2StringExpression
Xiao Li created SPARK-18174: --- Summary: Avoid Implicit Type Cast in Arguments of Expressions Extending String2StringExpression Key: SPARK-18174 URL: https://issues.apache.org/jira/browse/SPARK-18174 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.1 Reporter: Xiao Li For the expressions that extend String2StringExpression (lower, upper, ltrim, rtrim, trim and reverse), Analyzer should not implicitly cast the arguments to string. If users input the some data types instead of string, we should issue an exception for this misuse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18174) Avoid Implicit Type Cast in Arguments of Expressions Extending String2StringExpression
[ https://issues.apache.org/jira/browse/SPARK-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18174: Assignee: Apache Spark > Avoid Implicit Type Cast in Arguments of Expressions Extending > String2StringExpression > -- > > Key: SPARK-18174 > URL: https://issues.apache.org/jira/browse/SPARK-18174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li >Assignee: Apache Spark > > For the expressions that extend String2StringExpression (lower, upper, ltrim, > rtrim, trim and reverse), Analyzer should not implicitly cast the arguments > to string. If users input the some data types instead of string, we should > issue an exception for this misuse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18174) Avoid Implicit Type Cast in Arguments of Expressions Extending String2StringExpression
[ https://issues.apache.org/jira/browse/SPARK-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18174: Assignee: (was: Apache Spark) > Avoid Implicit Type Cast in Arguments of Expressions Extending > String2StringExpression > -- > > Key: SPARK-18174 > URL: https://issues.apache.org/jira/browse/SPARK-18174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li > > For the expressions that extend String2StringExpression (lower, upper, ltrim, > rtrim, trim and reverse), Analyzer should not implicitly cast the arguments > to string. If users input the some data types instead of string, we should > issue an exception for this misuse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18174) Avoid Implicit Type Cast in Arguments of Expressions Extending String2StringExpression
[ https://issues.apache.org/jira/browse/SPARK-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620431#comment-15620431 ] Apache Spark commented on SPARK-18174: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/15690 > Avoid Implicit Type Cast in Arguments of Expressions Extending > String2StringExpression > -- > > Key: SPARK-18174 > URL: https://issues.apache.org/jira/browse/SPARK-18174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li > > For the expressions that extend String2StringExpression (lower, upper, ltrim, > rtrim, trim and reverse), Analyzer should not implicitly cast the arguments > to string. If users input the some data types instead of string, we should > issue an exception for this misuse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18103) Rename *FileCatalog to *FileProvider
[ https://issues.apache.org/jira/browse/SPARK-18103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-18103. - Resolution: Fixed Assignee: Eric Liang Fix Version/s: 2.1.0 > Rename *FileCatalog to *FileProvider > > > Key: SPARK-18103 > URL: https://issues.apache.org/jira/browse/SPARK-18103 > Project: Spark > Issue Type: Improvement >Reporter: Eric Liang >Assignee: Eric Liang >Priority: Minor > Fix For: 2.1.0 > > > In the SQL component there are too many different components called some > variant of *Catalog, which is quite confusing. We should rename the > subclasses of FileCatalog to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17791) Join reordering using star schema detection
[ https://issues.apache.org/jira/browse/SPARK-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620517#comment-15620517 ] Ioana Delaney commented on SPARK-17791: --- [~ron8hu] I appreciate your comment. Thank you. I agree that the algorithm will have to evolve as CBO introduces new features such as cardinality, predicate selectivity, and ultimately the cost-based planning itself. The current proposal is conservative in choosing a star plan and can be made even more conservative. I can look at what CBO implements today for the number of distinct values and base table cardinality as suggested by [~wangzhenhua]. A check for pseudo RI using these two estimates can be easily incorporated into our current star-schema detection. The algorithm is also disabled by default. We can keep it disabled until we have a tighter integration with CBO. But there are advantages in letting the code in before CBO is completely implemented. From an implementation point of view, this will allow us to incrementally deliver our work. Then, given its good performance results, the feature can be enabled on demand for warehouse workloads that can take advantage of star join planning. Thank you. > Join reordering using star schema detection > --- > > Key: SPARK-17791 > URL: https://issues.apache.org/jira/browse/SPARK-17791 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: Ioana Delaney >Assignee: Ioana Delaney >Priority: Critical > Attachments: StarJoinReordering1005.doc > > > This JIRA is a sub-task of SPARK-17626. > The objective is to provide a consistent performance improvement for star > schema queries. Star schema consists of one or more fact tables referencing a > number of dimension tables. In general, queries against star schema are > expected to run fast because of the established RI constraints among the > tables. This design proposes a join reordering based on natural, generally > accepted heuristics for star schema queries: > * Finds the star join with the largest fact table and places it on the > driving arm of the left-deep join. This plan avoids large tables on the > inner, and thus favors hash joins. > * Applies the most selective dimensions early in the plan to reduce the > amount of data flow. > The design description is included in the below attached document. > \\ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17791) Join reordering using star schema detection
[ https://issues.apache.org/jira/browse/SPARK-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620517#comment-15620517 ] Ioana Delaney edited comment on SPARK-17791 at 10/30/16 8:22 PM: - [~ron8hu] I appreciate your comment. Thank you. I agree that the algorithm will have to evolve as CBO introduces new features such as cardinality, predicate selectivity, and ultimately the cost-based planning itself. The current proposal is conservative in choosing a star plan and can be made even more conservative. I can look at what CBO implements today for the number of distinct values and base table cardinality as suggested by [~mikewzh]. A check for pseudo RI using these two estimates can be easily incorporated into our current star-schema detection. The algorithm is also disabled by default. We can keep it disabled until we have a tighter integration with CBO. But there are advantages in letting the code in before CBO is completely implemented. From an implementation point of view, this will allow us to incrementally deliver our work. Then, given its good performance results, the feature can be enabled on demand for warehouse workloads that can take advantage of star join planning. Thank you. was (Author: ioana-delaney): [~ron8hu] I appreciate your comment. Thank you. I agree that the algorithm will have to evolve as CBO introduces new features such as cardinality, predicate selectivity, and ultimately the cost-based planning itself. The current proposal is conservative in choosing a star plan and can be made even more conservative. I can look at what CBO implements today for the number of distinct values and base table cardinality as suggested by [~wangzhenhua]. A check for pseudo RI using these two estimates can be easily incorporated into our current star-schema detection. The algorithm is also disabled by default. We can keep it disabled until we have a tighter integration with CBO. But there are advantages in letting the code in before CBO is completely implemented. From an implementation point of view, this will allow us to incrementally deliver our work. Then, given its good performance results, the feature can be enabled on demand for warehouse workloads that can take advantage of star join planning. Thank you. > Join reordering using star schema detection > --- > > Key: SPARK-17791 > URL: https://issues.apache.org/jira/browse/SPARK-17791 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: Ioana Delaney >Assignee: Ioana Delaney >Priority: Critical > Attachments: StarJoinReordering1005.doc > > > This JIRA is a sub-task of SPARK-17626. > The objective is to provide a consistent performance improvement for star > schema queries. Star schema consists of one or more fact tables referencing a > number of dimension tables. In general, queries against star schema are > expected to run fast because of the established RI constraints among the > tables. This design proposes a join reordering based on natural, generally > accepted heuristics for star schema queries: > * Finds the star join with the largest fact table and places it on the > driving arm of the left-deep join. This plan avoids large tables on the > inner, and thus favors hash joins. > * Applies the most selective dimensions early in the plan to reduce the > amount of data flow. > The design description is included in the below attached document. > \\ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9487: --- Assignee: (was: Apache Spark) > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620610#comment-15620610 ] Apache Spark commented on SPARK-9487: - User 'skanjila' has created a pull request for this issue: https://github.com/apache/spark/pull/15689 > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9487: --- Assignee: Apache Spark > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng >Assignee: Apache Spark > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620613#comment-15620613 ] Saikat Kanjilal commented on SPARK-9487: [~srowen] Yes I read through that and adjusted the PR title, I will Jenkins test this next, however please do let me know if I can proceed adding more to this PR including python and other parts of the codebase. > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620613#comment-15620613 ] Saikat Kanjilal edited comment on SPARK-9487 at 10/30/16 9:40 PM: -- [~srowen] Yes I read through that link and adjusted the PR title, I will Jenkins test this next, however please do let me know if I can proceed adding more to this PR including python and other parts of the codebase. was (Author: kanjilal): [~srowen] Yes I read through that and adjusted the PR title, I will Jenkins test this next, however please do let me know if I can proceed adding more to this PR including python and other parts of the codebase. > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620613#comment-15620613 ] Saikat Kanjilal edited comment on SPARK-9487 at 10/30/16 9:46 PM: -- [~srowen] Yes I read through that link and adjusted the PR title, however please do let me know if I can proceed adding more to this PR including python and other parts of the codebase. was (Author: kanjilal): [~srowen] Yes I read through that link and adjusted the PR title, I will Jenkins test this next, however please do let me know if I can proceed adding more to this PR including python and other parts of the codebase. > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests
[ https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620657#comment-15620657 ] Saikat Kanjilal commented on SPARK-9487: Added org.apache.spark.mllib unitTest changes to pull request > Use the same num. worker threads in Scala/Python unit tests > --- > > Key: SPARK-9487 > URL: https://issues.apache.org/jira/browse/SPARK-9487 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core, SQL, Tests >Affects Versions: 1.5.0 >Reporter: Xiangrui Meng > Labels: starter > Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults > > > In Python we use `local[4]` for unit tests, while in Scala/Java we use > `local[2]` and `local` for some unit tests in SQL, MLLib, and other > components. If the operation depends on partition IDs, e.g., random number > generator, this will lead to different result in Python and Scala/Java. It > would be nice to use the same number in all unit tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18106) Analyze Table accepts a garbage identifier at the end
[ https://issues.apache.org/jira/browse/SPARK-18106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-18106. --- Resolution: Fixed Assignee: Dongjoon Hyun Fix Version/s: 2.1.0 > Analyze Table accepts a garbage identifier at the end > - > > Key: SPARK-18106 > URL: https://issues.apache.org/jira/browse/SPARK-18106 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Srinath >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.1.0 > > > {noformat} > scala> sql("create table test(a int)") > res2: org.apache.spark.sql.DataFrame = [] > scala> sql("analyze table test compute statistics blah") > res3: org.apache.spark.sql.DataFrame = [] > {noformat} > An identifier that is not "noscan" produces an AnalyzeTableCommand with > noscan=false -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18174) Avoid Implicit Type Cast in Arguments of Expressions Extending String2StringExpression
[ https://issues.apache.org/jira/browse/SPARK-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620685#comment-15620685 ] Herman van Hovell commented on SPARK-18174: --- [~smilegator] Won't this create a regression for users who are relying on this? > Avoid Implicit Type Cast in Arguments of Expressions Extending > String2StringExpression > -- > > Key: SPARK-18174 > URL: https://issues.apache.org/jira/browse/SPARK-18174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li > > For the expressions that extend String2StringExpression (lower, upper, ltrim, > rtrim, trim and reverse), Analyzer should not implicitly cast the arguments > to string. If users input the some data types instead of string, we should > issue an exception for this misuse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16740) joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
[ https://issues.apache.org/jira/browse/SPARK-16740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620711#comment-15620711 ] Harish commented on SPARK-16740: is this fix is available in 2.0.2 snapshot?. Please confirm > joins.LongToUnsafeRowMap crashes with NegativeArraySizeException > > > Key: SPARK-16740 > URL: https://issues.apache.org/jira/browse/SPARK-16740 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Sylvain Zimmer >Assignee: Sylvain Zimmer > Fix For: 2.0.1, 2.1.0 > > > Hello, > Here is a crash in Spark SQL joins, with a minimal reproducible test case. > Interestingly, it only seems to happen when reading Parquet data (I added a > {{crash = True}} variable to show it) > This is an {{left_outer}} example, but it also crashes with a regular > {{inner}} join. > {code} > import os > from pyspark import SparkContext > from pyspark.sql import types as SparkTypes > from pyspark.sql import SQLContext > sc = SparkContext() > sqlc = SQLContext(sc) > schema1 = SparkTypes.StructType([ > SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True) > ]) > schema2 = SparkTypes.StructType([ > SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True) > ]) > # Valid Long values (-9223372036854775808 < -5543241376386463808 , > 4661454128115150227 < 9223372036854775807) > data1 = [(4661454128115150227,), (-5543241376386463808,)] > data2 = [(650460285, )] > df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1) > df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2) > crash = True > if crash: > os.system("rm -rf /tmp/sparkbug") > df1.write.parquet("/tmp/sparkbug/vertex") > df2.write.parquet("/tmp/sparkbug/edge") > df1 = sqlc.read.load("/tmp/sparkbug/vertex") > df2 = sqlc.read.load("/tmp/sparkbug/edge") > result_df = df2.join(df1, on=(df1.id1 == df2.id2), how="left_outer") > # Should print [Row(id2=650460285, id1=None)] > print result_df.collect() > {code} > When ran with {{spark-submit}}, the final {{collect()}} call crashes with > this: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling > o61.collectToPython. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenOuter(BroadcastHashJoinExec.scala:242) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.consume(ExistingRDD.scala:225) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.doProduce(ExistingRDD.scala:328) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.produce(ExistingRDD.scala:225) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:77) > at > org.apache.spark.sql.execution.CodegenSupport$$anon
[jira] [Commented] (SPARK-15616) Metastore relation should fallback to HDFS size of partitions that are involved in Query if statistics are not available.
[ https://issues.apache.org/jira/browse/SPARK-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620750#comment-15620750 ] Franck Tago commented on SPARK-15616: - SO was not able to use the changes for the following reasons . 1-I forgot to mention that I am working off the spark 2.0.1 branch. 2- I get the following error [info] Compiling 30 Scala sources and 2 Java sources to /export/home/devbld/spark_world/Mercury/pvt/ftago/spark-2.0.1/sql/hive/target/scala-2.11/classes... [error] /export/home/devbld/spark_world/Mercury/pvt/ftago/spark-2.0.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala:295: type mismatch; [error] found : Seq[org.apache.spark.sql.catalyst.expressions.Expression] [error] required: Option[String] [error] MetastoreRelation(databaseName, tableName, partitionPruningPred)(catalogTable, client, sparkSession) [error]^ [error] one error found [error] Compile failed Can you please build a version of this fix off spark 2.0.1? I tried incorporating your changes but as pointed to the error message shown above , I was not able to . > Metastore relation should fallback to HDFS size of partitions that are > involved in Query if statistics are not available. > - > > Key: SPARK-15616 > URL: https://issues.apache.org/jira/browse/SPARK-15616 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Lianhui Wang > > Currently if some partitions of a partitioned table are used in join > operation we rely on Metastore returned size of table to calculate if we can > convert the operation to Broadcast join. > if Filter can prune some partitions, Hive can prune partition before > determining to use broadcast joins according to HDFS size of partitions that > are involved in Query.So sparkSQL needs it that can improve join's > performance for partitioned table. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-17919) Make timeout to RBackend configurable in SparkR
[ https://issues.apache.org/jira/browse/SPARK-17919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17919. -- Resolution: Fixed Assignee: Hossein Falaki > Make timeout to RBackend configurable in SparkR > --- > > Key: SPARK-17919 > URL: https://issues.apache.org/jira/browse/SPARK-17919 > Project: Spark > Issue Type: Story > Components: SparkR >Affects Versions: 2.0.1 >Reporter: Hossein Falaki >Assignee: Hossein Falaki > > I am working on a project where {{gapply()}} is being used with a large > dataset that happens to be extremely skewed. On that skewed partition, the > user function takes more than 2 hours to return and that turns out to be > larger than the timeout that we hardcode in SparkR for backend connection. > {code} > connectBackend <- function(hostname, port, timeout = 6000) > {code} > Ideally user should be able to reconfigure Spark and increase the timeout. It > should be a small fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17919) Make timeout to RBackend configurable in SparkR
[ https://issues.apache.org/jira/browse/SPARK-17919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-17919: - Fix Version/s: 2.1.0 > Make timeout to RBackend configurable in SparkR > --- > > Key: SPARK-17919 > URL: https://issues.apache.org/jira/browse/SPARK-17919 > Project: Spark > Issue Type: Story > Components: SparkR >Affects Versions: 2.0.1 >Reporter: Hossein Falaki >Assignee: Hossein Falaki > Fix For: 2.1.0 > > > I am working on a project where {{gapply()}} is being used with a large > dataset that happens to be extremely skewed. On that skewed partition, the > user function takes more than 2 hours to return and that turns out to be > larger than the timeout that we hardcode in SparkR for backend connection. > {code} > connectBackend <- function(hostname, port, timeout = 6000) > {code} > Ideally user should be able to reconfigure Spark and increase the timeout. It > should be a small fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16137) Random Forest wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-16137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-16137. -- Resolution: Fixed Assignee: Felix Cheung Target Version/s: 2.1.0 > Random Forest wrapper in SparkR > --- > > Key: SPARK-16137 > URL: https://issues.apache.org/jira/browse/SPARK-16137 > Project: Spark > Issue Type: Sub-task > Components: ML, SparkR >Affects Versions: 2.1.0 >Reporter: Kai Jiang >Assignee: Felix Cheung > > Implement a wrapper in SparkR to support Random Forest. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16137) Random Forest wrapper in SparkR
[ https://issues.apache.org/jira/browse/SPARK-16137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-16137: - Fix Version/s: 2.1.0 > Random Forest wrapper in SparkR > --- > > Key: SPARK-16137 > URL: https://issues.apache.org/jira/browse/SPARK-16137 > Project: Spark > Issue Type: Sub-task > Components: ML, SparkR >Affects Versions: 2.1.0 >Reporter: Kai Jiang >Assignee: Felix Cheung > Fix For: 2.1.0 > > > Implement a wrapper in SparkR to support Random Forest. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18110) Missing parameter in Python for RandomForest regression and classification
[ https://issues.apache.org/jira/browse/SPARK-18110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-18110. -- Resolution: Fixed Fix Version/s: 2.1.0 > Missing parameter in Python for RandomForest regression and classification > -- > > Key: SPARK-18110 > URL: https://issues.apache.org/jira/browse/SPARK-18110 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.1 >Reporter: Felix Cheung >Assignee: Felix Cheung > Fix For: 2.1.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18174) Avoid Implicit Type Cast in Arguments of Expressions Extending String2StringExpression
[ https://issues.apache.org/jira/browse/SPARK-18174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620785#comment-15620785 ] Xiao Li commented on SPARK-18174: - Yeah. Your concern is right. The impact of implicit type casting is not small. I posted some thoughts in the PR. We should not merge this PR anyway. > Avoid Implicit Type Cast in Arguments of Expressions Extending > String2StringExpression > -- > > Key: SPARK-18174 > URL: https://issues.apache.org/jira/browse/SPARK-18174 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li > > For the expressions that extend String2StringExpression (lower, upper, ltrim, > rtrim, trim and reverse), Analyzer should not implicitly cast the arguments > to string. If users input the some data types instead of string, we should > issue an exception for this misuse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16740) joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
[ https://issues.apache.org/jira/browse/SPARK-16740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620813#comment-15620813 ] Dongjoon Hyun commented on SPARK-16740: --- Hi, [~harishk15] Yep. The patch is still there in branch-2.0. I guess you can test that with Spark 2.0.2-rc1, too. If you think you meet some related issue in 2.0.2-rc1, please file a Jira issue. > joins.LongToUnsafeRowMap crashes with NegativeArraySizeException > > > Key: SPARK-16740 > URL: https://issues.apache.org/jira/browse/SPARK-16740 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Sylvain Zimmer >Assignee: Sylvain Zimmer > Fix For: 2.0.1, 2.1.0 > > > Hello, > Here is a crash in Spark SQL joins, with a minimal reproducible test case. > Interestingly, it only seems to happen when reading Parquet data (I added a > {{crash = True}} variable to show it) > This is an {{left_outer}} example, but it also crashes with a regular > {{inner}} join. > {code} > import os > from pyspark import SparkContext > from pyspark.sql import types as SparkTypes > from pyspark.sql import SQLContext > sc = SparkContext() > sqlc = SQLContext(sc) > schema1 = SparkTypes.StructType([ > SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True) > ]) > schema2 = SparkTypes.StructType([ > SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True) > ]) > # Valid Long values (-9223372036854775808 < -5543241376386463808 , > 4661454128115150227 < 9223372036854775807) > data1 = [(4661454128115150227,), (-5543241376386463808,)] > data2 = [(650460285, )] > df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1) > df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2) > crash = True > if crash: > os.system("rm -rf /tmp/sparkbug") > df1.write.parquet("/tmp/sparkbug/vertex") > df2.write.parquet("/tmp/sparkbug/edge") > df1 = sqlc.read.load("/tmp/sparkbug/vertex") > df2 = sqlc.read.load("/tmp/sparkbug/edge") > result_df = df2.join(df1, on=(df1.id1 == df2.id2), how="left_outer") > # Should print [Row(id2=650460285, id1=None)] > print result_df.collect() > {code} > When ran with {{spark-submit}}, the final {{collect()}} call crashes with > this: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling > o61.collectToPython. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenOuter(BroadcastHashJoinExec.scala:242) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.consume(ExistingRDD.scala:225) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.doProduce(ExistingRDD.scala:328) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.produce(ExistingRDD.scala:225) > at > org.apache.sp
[jira] [Commented] (SPARK-12648) UDF with Option[Double] throws ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-12648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620882#comment-15620882 ] Grant Neale commented on SPARK-12648: - This works for single, argument UDFs. However, one may want to define a multi-argument UDF that allows some arguments to be null. > UDF with Option[Double] throws ClassCastException > - > > Key: SPARK-12648 > URL: https://issues.apache.org/jira/browse/SPARK-12648 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Mikael Valot > > I can write an UDF that returns an Option[Double], and the DataFrame's > schema is correctly inferred to be a nullable double. > However I cannot seem to be able to write a UDF that takes an Option as an > argument: > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkContext, SparkConf} > val conf = new SparkConf().setMaster("local[4]").setAppName("test") > val sc = new SparkContext(conf) > val sqlc = new SQLContext(sc) > import sqlc.implicits._ > val df = sc.parallelize(List(("a", Some(4D)), ("b", None))).toDF("name", > "weight") > import org.apache.spark.sql.functions._ > val addTwo = udf((d: Option[Double]) => d.map(_+2)) > df.withColumn("plusTwo", addTwo(df("weight"))).show() > => > 2016-01-05T14:41:52 Executor task launch worker-0 ERROR > org.apache.spark.executor.Executor Exception in task 0.0 in stage 1.0 (TID 1) > java.lang.ClassCastException: java.lang.Double cannot be cast to scala.Option > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1.apply(:18) > ~[na:na] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) ~[na:na] > at > org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51) > ~[spark-sql_2.10-1.6.0.jar:1.6.0] > at > org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49) > ~[spark-sql_2.10-1.6.0.jar:1.6.0] > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > ~[scala-library-2.10.5.jar:na] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16740) joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
[ https://issues.apache.org/jira/browse/SPARK-16740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620885#comment-15620885 ] Harish commented on SPARK-16740: Thank you. I downloaded the 2.0.2 snapshot with 2.7 Hadoop (i think its on 10/13). I can still reproduce this issue. If the "2.0.2-rc1" was updated after 10/13 then i will take the updates and try. Can you please help me to find the latest download path.? > joins.LongToUnsafeRowMap crashes with NegativeArraySizeException > > > Key: SPARK-16740 > URL: https://issues.apache.org/jira/browse/SPARK-16740 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Sylvain Zimmer >Assignee: Sylvain Zimmer > Fix For: 2.0.1, 2.1.0 > > > Hello, > Here is a crash in Spark SQL joins, with a minimal reproducible test case. > Interestingly, it only seems to happen when reading Parquet data (I added a > {{crash = True}} variable to show it) > This is an {{left_outer}} example, but it also crashes with a regular > {{inner}} join. > {code} > import os > from pyspark import SparkContext > from pyspark.sql import types as SparkTypes > from pyspark.sql import SQLContext > sc = SparkContext() > sqlc = SQLContext(sc) > schema1 = SparkTypes.StructType([ > SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True) > ]) > schema2 = SparkTypes.StructType([ > SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True) > ]) > # Valid Long values (-9223372036854775808 < -5543241376386463808 , > 4661454128115150227 < 9223372036854775807) > data1 = [(4661454128115150227,), (-5543241376386463808,)] > data2 = [(650460285, )] > df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1) > df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2) > crash = True > if crash: > os.system("rm -rf /tmp/sparkbug") > df1.write.parquet("/tmp/sparkbug/vertex") > df2.write.parquet("/tmp/sparkbug/edge") > df1 = sqlc.read.load("/tmp/sparkbug/vertex") > df2 = sqlc.read.load("/tmp/sparkbug/edge") > result_df = df2.join(df1, on=(df1.id1 == df2.id2), how="left_outer") > # Should print [Row(id2=650460285, id1=None)] > print result_df.collect() > {code} > When ran with {{spark-submit}}, the final {{collect()}} call crashes with > this: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling > o61.collectToPython. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenOuter(BroadcastHashJoinExec.scala:242) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.consume(ExistingRDD.scala:225) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.doProduce(ExistingRDD.scala:328) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.produce(Exis
[jira] [Created] (SPARK-18175) Improve the test case coverage of implicit type casting
Xiao Li created SPARK-18175: --- Summary: Improve the test case coverage of implicit type casting Key: SPARK-18175 URL: https://issues.apache.org/jira/browse/SPARK-18175 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.0.1 Reporter: Xiao Li So far, we have limited test case coverage about implicit type casting. We need to draw a matrix to find all the possible casting pairs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18175) Improve the test case coverage of implicit type casting
[ https://issues.apache.org/jira/browse/SPARK-18175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18175: Assignee: (was: Apache Spark) > Improve the test case coverage of implicit type casting > --- > > Key: SPARK-18175 > URL: https://issues.apache.org/jira/browse/SPARK-18175 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li > > So far, we have limited test case coverage about implicit type casting. We > need to draw a matrix to find all the possible casting pairs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18175) Improve the test case coverage of implicit type casting
[ https://issues.apache.org/jira/browse/SPARK-18175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620930#comment-15620930 ] Apache Spark commented on SPARK-18175: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/15691 > Improve the test case coverage of implicit type casting > --- > > Key: SPARK-18175 > URL: https://issues.apache.org/jira/browse/SPARK-18175 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li > > So far, we have limited test case coverage about implicit type casting. We > need to draw a matrix to find all the possible casting pairs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18175) Improve the test case coverage of implicit type casting
[ https://issues.apache.org/jira/browse/SPARK-18175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18175: Assignee: Apache Spark > Improve the test case coverage of implicit type casting > --- > > Key: SPARK-18175 > URL: https://issues.apache.org/jira/browse/SPARK-18175 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.0.1 >Reporter: Xiao Li >Assignee: Apache Spark > > So far, we have limited test case coverage about implicit type casting. We > need to draw a matrix to find all the possible casting pairs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17952) SparkSession createDataFrame method throws exception for nested JavaBeans
[ https://issues.apache.org/jira/browse/SPARK-17952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amit Baghel updated SPARK-17952: Summary: SparkSession createDataFrame method throws exception for nested JavaBeans (was: Java SparkSession createDataFrame method throws exception for nested JavaBeans) > SparkSession createDataFrame method throws exception for nested JavaBeans > - > > Key: SPARK-17952 > URL: https://issues.apache.org/jira/browse/SPARK-17952 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1 >Reporter: Amit Baghel > > As per latest spark documentation for Java at > http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection, > > {quote} > Nested JavaBeans and List or Array fields are supported though. > {quote} > However nested JavaBean is not working. Please see the below code. > SubCategory class > {code} > public class SubCategory implements Serializable{ > private String id; > private String name; > > public String getId() { > return id; > } > public void setId(String id) { > this.id = id; > } > public String getName() { > return name; > } > public void setName(String name) { > this.name = name; > } > } > {code} > Category class > {code} > public class Category implements Serializable{ > private String id; > private SubCategory subCategory; > > public String getId() { > return id; > } > public void setId(String id) { > this.id = id; > } > public SubCategory getSubCategory() { > return subCategory; > } > public void setSubCategory(SubCategory subCategory) { > this.subCategory = subCategory; > } > } > {code} > SparkSample class > {code} > public class SparkSample { > public static void main(String[] args) throws IOException { > > SparkSession spark = SparkSession > .builder() > .appName("SparkSample") > .master("local") > .getOrCreate(); > //SubCategory > SubCategory sub = new SubCategory(); > sub.setId("sc-111"); > sub.setName("Sub-1"); > //Category > Category category = new Category(); > category.setId("s-111"); > category.setSubCategory(sub); > //categoryList > List categoryList = new ArrayList(); > categoryList.add(category); >//DF > Dataset dframe = spark.createDataFrame(categoryList, > Category.class); > dframe.show(); > } > } > {code} > Above code throws below error. > {code} > Exception in thread "main" scala.MatchError: com.sample.SubCategory@e7391d > (of class com.sample.SubCategory) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:256) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103) > at > org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:403) > at > org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106) > at > org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1106) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1106) > at > org.apache.spark.sql.SQLContext$$anonfun$beansToRows$1.apply(SQLContext.scala:1104) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$class.toStream(Iterator.scala:1322) > at scala.collection.AbstractIterator.toStream(Iterator.scala:1336) > at > scala.colle
[jira] [Commented] (SPARK-17791) Join reordering using star schema detection
[ https://issues.apache.org/jira/browse/SPARK-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621070#comment-15621070 ] Zhenhua Wang commented on SPARK-17791: -- Hi Ioana, The current implementation is NOT ready because we need to refactor the statistics structure to make it easier to use during cost estimation, it won't be stable until we finish the estimation part. I think it is necessary and important to use CBO based RI. You can start to incorporate it in the algorithm and rebase after the related code refactor is finished. Thanks. > Join reordering using star schema detection > --- > > Key: SPARK-17791 > URL: https://issues.apache.org/jira/browse/SPARK-17791 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.1.0 >Reporter: Ioana Delaney >Assignee: Ioana Delaney >Priority: Critical > Attachments: StarJoinReordering1005.doc > > > This JIRA is a sub-task of SPARK-17626. > The objective is to provide a consistent performance improvement for star > schema queries. Star schema consists of one or more fact tables referencing a > number of dimension tables. In general, queries against star schema are > expected to run fast because of the established RI constraints among the > tables. This design proposes a join reordering based on natural, generally > accepted heuristics for star schema queries: > * Finds the star join with the largest fact table and places it on the > driving arm of the left-deep join. This plan avoids large tables on the > inner, and thus favors hash joins. > * Applies the most selective dimensions early in the plan to reduce the > amount of data flow. > The design description is included in the below attached document. > \\ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18176) Kafka010 .createRDD() scala API should expect scala Map
Liwei Lin created SPARK-18176: - Summary: Kafka010 .createRDD() scala API should expect scala Map Key: SPARK-18176 URL: https://issues.apache.org/jira/browse/SPARK-18176 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 2.0.1, 2.0.0 Reporter: Liwei Lin Thoughout {{external/kafka-010}}, Java APIs are expecting {{java.lang.Maps}} and Scala APIs are expecting {{scala.collection.Maps}}, with the exception of {{KafkaUtils.createRDD()}} Scala API expecting a {{java.lang.Map}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18176) Kafka010 .createRDD() scala API should expect scala Map
[ https://issues.apache.org/jira/browse/SPARK-18176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated SPARK-18176: -- Description: Thoughout {{external/kafka-010}}, Java APIs are expecting {{java.lang.Maps}} and Scala APIs are expecting {{scala.collection.Maps}}, with the exception of {{KafkaUtils.createRDD()}} Scala API expecting a {{java.lang.Map}}. But please note, this is a public API change. was:Thoughout {{external/kafka-010}}, Java APIs are expecting {{java.lang.Maps}} and Scala APIs are expecting {{scala.collection.Maps}}, with the exception of {{KafkaUtils.createRDD()}} Scala API expecting a {{java.lang.Map}}. > Kafka010 .createRDD() scala API should expect scala Map > --- > > Key: SPARK-18176 > URL: https://issues.apache.org/jira/browse/SPARK-18176 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0, 2.0.1 >Reporter: Liwei Lin > > Thoughout {{external/kafka-010}}, Java APIs are expecting {{java.lang.Maps}} > and Scala APIs are expecting {{scala.collection.Maps}}, with the exception of > {{KafkaUtils.createRDD()}} Scala API expecting a {{java.lang.Map}}. > But please note, this is a public API change. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18176) Kafka010 .createRDD() scala API should expect scala Map
[ https://issues.apache.org/jira/browse/SPARK-18176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18176: Assignee: (was: Apache Spark) > Kafka010 .createRDD() scala API should expect scala Map > --- > > Key: SPARK-18176 > URL: https://issues.apache.org/jira/browse/SPARK-18176 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0, 2.0.1 >Reporter: Liwei Lin > > Thoughout {{external/kafka-010}}, Java APIs are expecting {{java.lang.Maps}} > and Scala APIs are expecting {{scala.collection.Maps}}, with the exception of > {{KafkaUtils.createRDD()}} Scala API expecting a {{java.lang.Map}}. > But please note, this is a public API change. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18176) Kafka010 .createRDD() scala API should expect scala Map
[ https://issues.apache.org/jira/browse/SPARK-18176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621075#comment-15621075 ] Apache Spark commented on SPARK-18176: -- User 'lw-lin' has created a pull request for this issue: https://github.com/apache/spark/pull/15681 > Kafka010 .createRDD() scala API should expect scala Map > --- > > Key: SPARK-18176 > URL: https://issues.apache.org/jira/browse/SPARK-18176 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0, 2.0.1 >Reporter: Liwei Lin > > Thoughout {{external/kafka-010}}, Java APIs are expecting {{java.lang.Maps}} > and Scala APIs are expecting {{scala.collection.Maps}}, with the exception of > {{KafkaUtils.createRDD()}} Scala API expecting a {{java.lang.Map}}. > But please note, this is a public API change. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18176) Kafka010 .createRDD() scala API should expect scala Map
[ https://issues.apache.org/jira/browse/SPARK-18176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18176: Assignee: Apache Spark > Kafka010 .createRDD() scala API should expect scala Map > --- > > Key: SPARK-18176 > URL: https://issues.apache.org/jira/browse/SPARK-18176 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0, 2.0.1 >Reporter: Liwei Lin >Assignee: Apache Spark > > Thoughout {{external/kafka-010}}, Java APIs are expecting {{java.lang.Maps}} > and Scala APIs are expecting {{scala.collection.Maps}}, with the exception of > {{KafkaUtils.createRDD()}} Scala API expecting a {{java.lang.Map}}. > But please note, this is a public API change. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org