[jira] [Commented] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
[ https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253400#comment-15253400 ] Bo Meng commented on SPARK-14840: - I am testing against master: 1. I do not think your test is valid, at least it should be: sqlContext.sql("drop table tmp.order") 2. It works fine if you just add {{`}} to {{order}}, without it, it will throw exception. sqlContext.sql("drop table `order`"); I have ignored {{tmp}} here. > Cannot drop a table which has the name starting with 'or' > - > > Key: SPARK-14840 > URL: https://issues.apache.org/jira/browse/SPARK-14840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Kwangwoo Kim > > sqlContext("drop table tmp.order") > The above code makes error as following: > 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order > 16/04/22 14:27:19 INFO ParseDriver: Parse Completed > 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected > tmp.order > ^ > java.lang.RuntimeException: [1.5] failure: identifier expected > tmp.order > ^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) > at > $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) > at $line15.$read$$iwC$$iwC$$iwC.(:39) > at $line15.$read$$iwC$$iwC.(:41) > at $line15.$read$$iwC.(:43) > at $line15.$read.(:45) > at $line15.$read$.(:49) > at $line15.$read$.() > at $line15.$eval$.(:7) > at $line15.$eval$.() > at $line15.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at >
[jira] [Commented] (SPARK-12660) Rewrite except using anti-join
[ https://issues.apache.org/jira/browse/SPARK-12660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253396#comment-15253396 ] Reynold Xin commented on SPARK-12660: - Sure. If it can be done now. > Rewrite except using anti-join > -- > > Key: SPARK-12660 > URL: https://issues.apache.org/jira/browse/SPARK-12660 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin > > Similar to SPARK-12656, we can rewrite except in the logical level using > anti-join. This way, we can take advantage of all the benefits of join > implementations (e.g. managed memory, code generation, broadcast joins). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12660) Rewrite except using anti-join
[ https://issues.apache.org/jira/browse/SPARK-12660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253393#comment-15253393 ] Xiao Li commented on SPARK-12660: - If nobody starts this, I can take it? > Rewrite except using anti-join > -- > > Key: SPARK-12660 > URL: https://issues.apache.org/jira/browse/SPARK-12660 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin > > Similar to SPARK-12656, we can rewrite except in the logical level using > anti-join. This way, we can take advantage of all the benefits of join > implementations (e.g. managed memory, code generation, broadcast joins). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12660) Rewrite except using anti-join
[ https://issues.apache.org/jira/browse/SPARK-12660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253389#comment-15253389 ] Reynold Xin commented on SPARK-12660: - cc [~hvanhovell] can we do this now? > Rewrite except using anti-join > -- > > Key: SPARK-12660 > URL: https://issues.apache.org/jira/browse/SPARK-12660 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin > > Similar to SPARK-12656, we can rewrite except in the logical level using > anti-join. This way, we can take advantage of all the benefits of join > implementations (e.g. managed memory, code generation, broadcast joins). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
[ https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253384#comment-15253384 ] Kwangwoo Kim commented on SPARK-14840: -- However, the sql sucessfully works in Hive and there was no problem in 1.4.1. > Cannot drop a table which has the name starting with 'or' > - > > Key: SPARK-14840 > URL: https://issues.apache.org/jira/browse/SPARK-14840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Kwangwoo Kim > > sqlContext("drop table tmp.order") > The above code makes error as following: > 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order > 16/04/22 14:27:19 INFO ParseDriver: Parse Completed > 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected > tmp.order > ^ > java.lang.RuntimeException: [1.5] failure: identifier expected > tmp.order > ^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) > at > $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) > at $line15.$read$$iwC$$iwC$$iwC.(:39) > at $line15.$read$$iwC$$iwC.(:41) > at $line15.$read$$iwC.(:43) > at $line15.$read.(:45) > at $line15.$read$.(:49) > at $line15.$read$.() > at $line15.$eval$.(:7) > at $line15.$eval$.() > at $line15.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >
[jira] [Commented] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
[ https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253376#comment-15253376 ] Bo Meng commented on SPARK-14840: - I think because {{order}} is a keyword, please try not to use it. > Cannot drop a table which has the name starting with 'or' > - > > Key: SPARK-14840 > URL: https://issues.apache.org/jira/browse/SPARK-14840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Kwangwoo Kim > > sqlContext("drop table tmp.order") > The above code makes error as following: > 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order > 16/04/22 14:27:19 INFO ParseDriver: Parse Completed > 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected > tmp.order > ^ > java.lang.RuntimeException: [1.5] failure: identifier expected > tmp.order > ^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) > at > $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) > at $line15.$read$$iwC$$iwC$$iwC.(:39) > at $line15.$read$$iwC$$iwC.(:41) > at $line15.$read$$iwC.(:43) > at $line15.$read.(:45) > at $line15.$read$.(:49) > at $line15.$read$.() > at $line15.$eval$.(:7) > at $line15.$eval$.() > at $line15.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at
[jira] [Comment Edited] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
[ https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253376#comment-15253376 ] Bo Meng edited comment on SPARK-14840 at 4/22/16 5:34 AM: -- I think because {{order}} is a keyword, please try not to use it as table name. was (Author: bomeng): I think because {{order}} is a keyword, please try not to use it. > Cannot drop a table which has the name starting with 'or' > - > > Key: SPARK-14840 > URL: https://issues.apache.org/jira/browse/SPARK-14840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Kwangwoo Kim > > sqlContext("drop table tmp.order") > The above code makes error as following: > 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order > 16/04/22 14:27:19 INFO ParseDriver: Parse Completed > 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected > tmp.order > ^ > java.lang.RuntimeException: [1.5] failure: identifier expected > tmp.order > ^ > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) > at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) > at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) > at > $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) > at $line15.$read$$iwC$$iwC$$iwC.(:39) > at $line15.$read$$iwC$$iwC.(:41) > at $line15.$read$$iwC.(:43) > at $line15.$read.(:45) > at $line15.$read$.(:49) > at $line15.$read$.() > at $line15.$eval$.(:7) > at $line15.$eval$.() > at $line15.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at >
[jira] [Updated] (SPARK-14839) Support for other types as option in OPTIONS clause
[ https://issues.apache.org/jira/browse/SPARK-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-14839: - Description: This was found in https://github.com/apache/spark/pull/12494. Currently, Spark SQL does not support other types and {{null}} as a value of an options. For example, {code} CREATE ... USING csv OPTIONS (path "your-path", quote null) {code} throws an exception below {code} Unsupported SQL statement == SQL == CREATE TEMPORARY TABLE carsTable (yearMade double, makeName string, modelName string, comments string, grp string) USING csv OPTIONS (path "your-path", quote null) org.apache.spark.sql.catalyst.parser.ParseException: Unsupported SQL statement == SQL == CREATE TEMPORARY TABLE carsTable (yearMade double, makeName string, modelName string, comments string, grp string) USING csv OPTIONS (path "your-path", quote null) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.nativeCommand(ParseDriver.scala:66) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:56) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:86) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:195) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:764) ... {code} Currently, Scala API supports to take options with the types, {{String}}, {{Long}}, {{Double}} and {{Boolean}} and Python API also supports other types. I think in this way we can support data sources in a consistent way. It looks it is okay to to provide other types as arguments just like [Microsoft SQL|https://msdn.microsoft.com/en-us/library/ms190322.aspx] because [SQL-1992|http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt] standard mentions options as below: {quote} An implementation remains conforming even if it provides user op- tions to process nonconforming SQL language or to process conform- ing SQL language in a nonconforming manner. {quote} was: This was found in https://github.com/apache/spark/pull/12494. Currently, Spark SQL does not support other types and {{null}} as a value of an options. For example, {code} ... CREATE ... USING csv OPTIONS (path "your-path", quote null) {code} throws an exception below {code} Unsupported SQL statement == SQL == CREATE TEMPORARY TABLE carsTable (yearMade double, makeName string, modelName string, comments string, grp string) USING csv OPTIONS (path "your-path", quote null) org.apache.spark.sql.catalyst.parser.ParseException: Unsupported SQL statement == SQL == CREATE TEMPORARY TABLE carsTable (yearMade double, makeName string, modelName string, comments string, grp string) USING csv OPTIONS (path "your-path", quote null) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.nativeCommand(ParseDriver.scala:66) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:56) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:86) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:195) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:764) ... {code} Currently, Scala API supports to take options with the types, {{String}}, {{Long}}, {{Double}} and {{Boolean}} and Python API also supports other types. I think in this way we can support data sources in a consistent way. It looks it is okay to to provide other types as arguments just like [Microsoft SQL|https://msdn.microsoft.com/en-us/library/ms190322.aspx] because [SQL-1992|http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt] standard mentions options as below: {quote} An implementation remains conforming even if it provides user op- tions to process nonconforming SQL language or to process conform- ing SQL language in a nonconforming manner. {quote} > Support for other types as option in OPTIONS clause > --- > > Key: SPARK-14839 > URL: https://issues.apache.org/jira/browse/SPARK-14839 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > > This was found in https://github.com/apache/spark/pull/12494. > Currently, Spark SQL does not support other types and {{null}} as a
[jira] [Updated] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
[ https://issues.apache.org/jira/browse/SPARK-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kwangwoo Kim updated SPARK-14840: - Description: sqlContext("drop table tmp.order") The above code makes error as following: 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order 16/04/22 14:27:19 INFO ParseDriver: Parse Completed 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected tmp.order ^ java.lang.RuntimeException: [1.5] failure: identifier expected tmp.order ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) at $line15.$read$$iwC$$iwC$$iwC.(:39) at $line15.$read$$iwC$$iwC.(:41) at $line15.$read$$iwC.(:43) at $line15.$read.(:45) at $line15.$read$.(:49) at $line15.$read$.() at $line15.$eval$.(:7) at $line15.$eval$.() at $line15.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at
[jira] [Created] (SPARK-14840) Cannot drop a table which has the name starting with 'or'
Kwangwoo Kim created SPARK-14840: Summary: Cannot drop a table which has the name starting with 'or' Key: SPARK-14840 URL: https://issues.apache.org/jira/browse/SPARK-14840 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.2 Reporter: Kwangwoo Kim sqlContext("drop table tmp.order") the above code makes error as following: 6/04/22 14:27:17 INFO ParseDriver: Parsing command: drop table tmp.order 16/04/22 14:27:19 INFO ParseDriver: Parse Completed 16/04/22 14:27:19 WARN DropTable: [1.5] failure: identifier expected tmp.order ^ java.lang.RuntimeException: [1.5] failure: identifier expected tmp.order ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser$.parseTableIdentifier(SqlParser.scala:58) at org.apache.spark.sql.SQLContext.table(SQLContext.scala:827) at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:62) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.(DataFrame.scala:145) at org.apache.spark.sql.DataFrame.(DataFrame.scala:130) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:26) at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) at $line15.$read$$iwC$$iwC$$iwC$$iwC.(:37) at $line15.$read$$iwC$$iwC$$iwC.(:39) at $line15.$read$$iwC$$iwC.(:41) at $line15.$read$$iwC.(:43) at $line15.$read.(:45) at $line15.$read$.(:49) at $line15.$read$.() at $line15.$eval$.(:7) at $line15.$eval$.() at $line15.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at
[jira] [Issue Comment Deleted] (SPARK-14541) SQL function: IFNULL, NULLIF, NVL and NVL2
[ https://issues.apache.org/jira/browse/SPARK-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Meng updated SPARK-14541: Comment: was deleted (was: I will try to do it one by one. ) > SQL function: IFNULL, NULLIF, NVL and NVL2 > -- > > Key: SPARK-14541 > URL: https://issues.apache.org/jira/browse/SPARK-14541 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Davies Liu > > It will be great to have these SQL functions: > IFNULL, NULLIF, NVL, NVL2 > The meaning of these functions could be found in oracle docs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14839) Support for other types as option in OPTIONS clause
Hyukjin Kwon created SPARK-14839: Summary: Support for other types as option in OPTIONS clause Key: SPARK-14839 URL: https://issues.apache.org/jira/browse/SPARK-14839 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.0.0 Reporter: Hyukjin Kwon Priority: Minor This was found in https://github.com/apache/spark/pull/12494. Currently, Spark SQL does not support other types and {{null}} as a value of an options. For example, {code} ... CREATE ... USING csv OPTIONS (path "your-path", quote null) {code} throws an exception below {code} Unsupported SQL statement == SQL == CREATE TEMPORARY TABLE carsTable (yearMade double, makeName string, modelName string, comments string, grp string) USING csv OPTIONS (path "your-path", quote null) org.apache.spark.sql.catalyst.parser.ParseException: Unsupported SQL statement == SQL == CREATE TEMPORARY TABLE carsTable (yearMade double, makeName string, modelName string, comments string, grp string) USING csv OPTIONS (path "your-path", quote null) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.nativeCommand(ParseDriver.scala:66) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:56) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:86) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:53) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:195) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:764) ... {code} Currently, Scala API supports to take options with the types, {{String}}, {{Long}}, {{Double}} and {{Boolean}} and Python API also supports other types. I think in this way we can support data sources in a consistent way. It looks it is okay to to provide other types as arguments just like [Microsoft SQL|https://msdn.microsoft.com/en-us/library/ms190322.aspx] because [SQL-1992|http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt] standard mentions options as below: {quote} An implementation remains conforming even if it provides user op- tions to process nonconforming SQL language or to process conform- ing SQL language in a nonconforming manner. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10001) Allow Ctrl-C in spark-shell to kill running job
[ https://issues.apache.org/jira/browse/SPARK-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-10001: --- Assignee: Jakob Odersky > Allow Ctrl-C in spark-shell to kill running job > --- > > Key: SPARK-10001 > URL: https://issues.apache.org/jira/browse/SPARK-10001 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Affects Versions: 1.4.1 >Reporter: Cheolsoo Park >Assignee: Jakob Odersky >Priority: Minor > Fix For: 2.0.0 > > > Hitting Ctrl-C in spark-sql (and other tools like presto) cancels any running > job and starts a new input line on the prompt. It would be nice if > spark-shell also can do that. Otherwise, in case a user submits a job, say he > made a mistake, and wants to cancel it, he needs to exit the shell and > re-login to continue his work. Re-login can be a pain especially in Spark on > yarn, since it takes a while to allocate AM container and initial executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10001) Allow Ctrl-C in spark-shell to kill running job
[ https://issues.apache.org/jira/browse/SPARK-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-10001. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12557 [https://github.com/apache/spark/pull/12557] > Allow Ctrl-C in spark-shell to kill running job > --- > > Key: SPARK-10001 > URL: https://issues.apache.org/jira/browse/SPARK-10001 > Project: Spark > Issue Type: Improvement > Components: Spark Shell >Affects Versions: 1.4.1 >Reporter: Cheolsoo Park >Priority: Minor > Fix For: 2.0.0 > > > Hitting Ctrl-C in spark-sql (and other tools like presto) cancels any running > job and starts a new input line on the prompt. It would be nice if > spark-shell also can do that. Otherwise, in case a user submits a job, say he > made a mistake, and wants to cancel it, he needs to exit the shell and > re-login to continue his work. Re-login can be a pain especially in Spark on > yarn, since it takes a while to allocate AM container and initial executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14791) TPCDS Q23B generate different result each time
[ https://issues.apache.org/jira/browse/SPARK-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14791: Assignee: Apache Spark (was: Davies Liu) > TPCDS Q23B generate different result each time > -- > > Key: SPARK-14791 > URL: https://issues.apache.org/jira/browse/SPARK-14791 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark >Priority: Blocker > > Sometimes the number of rows of some operators will become zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14791) TPCDS Q23B generate different result each time
[ https://issues.apache.org/jira/browse/SPARK-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253346#comment-15253346 ] Apache Spark commented on SPARK-14791: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/12600 > TPCDS Q23B generate different result each time > -- > > Key: SPARK-14791 > URL: https://issues.apache.org/jira/browse/SPARK-14791 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Blocker > > Sometimes the number of rows of some operators will become zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14791) TPCDS Q23B generate different result each time
[ https://issues.apache.org/jira/browse/SPARK-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14791: Assignee: Davies Liu (was: Apache Spark) > TPCDS Q23B generate different result each time > -- > > Key: SPARK-14791 > URL: https://issues.apache.org/jira/browse/SPARK-14791 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Blocker > > Sometimes the number of rows of some operators will become zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14791) TPCDS Q23B generate different result each time
[ https://issues.apache.org/jira/browse/SPARK-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-14791: -- Assignee: Davies Liu > TPCDS Q23B generate different result each time > -- > > Key: SPARK-14791 > URL: https://issues.apache.org/jira/browse/SPARK-14791 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Blocker > > Sometimes the number of rows of some operators will become zero. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14835) Remove MetastoreRelation dependency from SQLBuilder
[ https://issues.apache.org/jira/browse/SPARK-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14835. - Resolution: Fixed Fix Version/s: 2.0.0 > Remove MetastoreRelation dependency from SQLBuilder > --- > > Key: SPARK-14835 > URL: https://issues.apache.org/jira/browse/SPARK-14835 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14369) Implement preferredLocations() for FileScanRDD
[ https://issues.apache.org/jira/browse/SPARK-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-14369. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12527 [https://github.com/apache/spark/pull/12527] > Implement preferredLocations() for FileScanRDD > -- > > Key: SPARK-14369 > URL: https://issues.apache.org/jira/browse/SPARK-14369 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Assignee: Cheng Lian > Fix For: 2.0.0 > > > Implement {{FileScanRDD.preferredLocations()}} to add locality support for > {{HadoopFsRelation}} based data sources. > We should avoid extra block location related RPC costs for S3, which doesn't > provide valid locality information. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14838) Skip automatically broadcast a plan when it contains ObjectProducer
[ https://issues.apache.org/jira/browse/SPARK-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253334#comment-15253334 ] Apache Spark commented on SPARK-14838: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/12599 > Skip automatically broadcast a plan when it contains ObjectProducer > --- > > Key: SPARK-14838 > URL: https://issues.apache.org/jira/browse/SPARK-14838 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh > > Spark will determine the plan size to automatically broadcast it or not when > doing join. As it can't estimate object type size, this mechanism will throw > failure as shown in > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56533/consoleFull. > We should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14838) Skip automatically broadcast a plan when it contains ObjectProducer
[ https://issues.apache.org/jira/browse/SPARK-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14838: Assignee: (was: Apache Spark) > Skip automatically broadcast a plan when it contains ObjectProducer > --- > > Key: SPARK-14838 > URL: https://issues.apache.org/jira/browse/SPARK-14838 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh > > Spark will determine the plan size to automatically broadcast it or not when > doing join. As it can't estimate object type size, this mechanism will throw > failure as shown in > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56533/consoleFull. > We should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14838) Skip automatically broadcast a plan when it contains ObjectProducer
[ https://issues.apache.org/jira/browse/SPARK-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14838: Assignee: Apache Spark > Skip automatically broadcast a plan when it contains ObjectProducer > --- > > Key: SPARK-14838 > URL: https://issues.apache.org/jira/browse/SPARK-14838 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark > > Spark will determine the plan size to automatically broadcast it or not when > doing join. As it can't estimate object type size, this mechanism will throw > failure as shown in > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56533/consoleFull. > We should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14838) Skip automatically broadcast a plan when it contains ObjectProducer
Liang-Chi Hsieh created SPARK-14838: --- Summary: Skip automatically broadcast a plan when it contains ObjectProducer Key: SPARK-14838 URL: https://issues.apache.org/jira/browse/SPARK-14838 Project: Spark Issue Type: Bug Components: SQL Reporter: Liang-Chi Hsieh Spark will determine the plan size to automatically broadcast it or not when doing join. As it can't estimate object type size, this mechanism will throw failure as shown in https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56533/consoleFull. We should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14680) Support all datatypes to use VectorizedHashmap in TungstenAggregate
[ https://issues.apache.org/jira/browse/SPARK-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-14680: --- Assignee: Sameer Agarwal > Support all datatypes to use VectorizedHashmap in TungstenAggregate > --- > > Key: SPARK-14680 > URL: https://issues.apache.org/jira/browse/SPARK-14680 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14680) Support all datatypes to use VectorizedHashmap in TungstenAggregate
[ https://issues.apache.org/jira/browse/SPARK-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-14680. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12440 [https://github.com/apache/spark/pull/12440] > Support all datatypes to use VectorizedHashmap in TungstenAggregate > --- > > Key: SPARK-14680 > URL: https://issues.apache.org/jira/browse/SPARK-14680 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Sameer Agarwal > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14793) Code generation for large complex type exceeds JVM size limit.
[ https://issues.apache.org/jira/browse/SPARK-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-14793: --- Assignee: Takuya Ueshin > Code generation for large complex type exceeds JVM size limit. > -- > > Key: SPARK-14793 > URL: https://issues.apache.org/jira/browse/SPARK-14793 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 2.0.0 > > > Code generation for complex type, {{CreateArray}}, {{CreateMap}}, > {{CreateStruct}}, {{CreateNamedStruct}}, exceeds JVM size limit for large > elements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14793) Code generation for large complex type exceeds JVM size limit.
[ https://issues.apache.org/jira/browse/SPARK-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-14793. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12559 [https://github.com/apache/spark/pull/12559] > Code generation for large complex type exceeds JVM size limit. > -- > > Key: SPARK-14793 > URL: https://issues.apache.org/jira/browse/SPARK-14793 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Takuya Ueshin > Fix For: 2.0.0 > > > Code generation for complex type, {{CreateArray}}, {{CreateMap}}, > {{CreateStruct}}, {{CreateNamedStruct}}, exceeds JVM size limit for large > elements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14831) Make ML APIs in SparkR consistent
[ https://issues.apache.org/jira/browse/SPARK-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253285#comment-15253285 ] Felix Cheung commented on SPARK-14831: -- I'd argue it is more important that they are like the existing R functions? Granted they are not consistent and they don't always match what Spark support, but I think we are expecting a large number of long time R users who are very familiar with how to call kmeans, to try to use Spark. However, take kmeans as an example, these are S4 methods, it should be possible to define them in such a way that they would look like R's kmeans by default, for example {code} setMethod("kmeans", signature(x = "DataFrame"), function(x, centers, iter.max = 10, algorithm = c("random", "k-means||")) {code} could be changed to as you later suggested (DataFrame to follow by Formula) {code} setMethod("kmeans", signature(data = "DataFrame"), function(data, formula = NULL, centers, iter.max = 10, algorithm = c("random", "k-means||")) {code} > Make ML APIs in SparkR consistent > - > > Key: SPARK-14831 > URL: https://issues.apache.org/jira/browse/SPARK-14831 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Critical > > In current master, we have 4 ML methods in SparkR: > {code:none} > glm(formula, family, data, ...) > kmeans(data, centers, ...) > naiveBayes(formula, data, ...) > survreg(formula, data, ...) > {code} > We tried to keep the signatures similar to existing ones in R. However, if we > put them together, they are not consistent. One example is k-means, which > doesn't accept a formula. Instead of looking at each method independently, we > might want to update the signature of kmeans to > {code:none} > kmeans(formula, data, centers, ...) > {code} > We can also discuss possible global changes here. For example, `glm` puts > `family` before `data` while `kmeans` puts `centers` after `data`. This is > not consistent. And logically, the formula doesn't mean anything without > associating with a DataFrame. So it makes more sense to me to have the > following signature: > {code:none} > algorithm(df, formula, [required params], [optional params]) > {code} > If we make this change, we might want to avoid name collisions because they > have different signature. We can use `ml.kmeans`, 'ml.glm`, etc. > Sorry for discussing API changes in the last minute. But I think it would be > better to have consistent signatures in SparkR. > cc: [~shivaram] [~josephkb] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14834) Force adding doc for new api in pyspark with @since annotation
[ https://issues.apache.org/jira/browse/SPARK-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253281#comment-15253281 ] Jeff Zhang edited comment on SPARK-14834 at 4/22/16 3:43 AM: - No, just intend to do it in python annotation decorator. See context here https://github.com/apache/spark/pull/10242/files was (Author: zjffdu): No, just intend to do it in python side. See context here https://github.com/apache/spark/pull/10242/files > Force adding doc for new api in pyspark with @since annotation > -- > > Key: SPARK-14834 > URL: https://issues.apache.org/jira/browse/SPARK-14834 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14834) Force adding doc for new api in pyspark with @since annotation
[ https://issues.apache.org/jira/browse/SPARK-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253281#comment-15253281 ] Jeff Zhang commented on SPARK-14834: No, just intend to do it in python side. https://github.com/apache/spark/pull/10242/files > Force adding doc for new api in pyspark with @since annotation > -- > > Key: SPARK-14834 > URL: https://issues.apache.org/jira/browse/SPARK-14834 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14834) Force adding doc for new api in pyspark with @since annotation
[ https://issues.apache.org/jira/browse/SPARK-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253281#comment-15253281 ] Jeff Zhang edited comment on SPARK-14834 at 4/22/16 3:42 AM: - No, just intend to do it in python side. See context here https://github.com/apache/spark/pull/10242/files was (Author: zjffdu): No, just intend to do it in python side. https://github.com/apache/spark/pull/10242/files > Force adding doc for new api in pyspark with @since annotation > -- > > Key: SPARK-14834 > URL: https://issues.apache.org/jira/browse/SPARK-14834 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14834) Force adding doc for new api in pyspark with @since annotation
[ https://issues.apache.org/jira/browse/SPARK-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated SPARK-14834: --- Priority: Minor (was: Major) > Force adding doc for new api in pyspark with @since annotation > -- > > Key: SPARK-14834 > URL: https://issues.apache.org/jira/browse/SPARK-14834 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14834) Force adding doc for new api in pyspark with @since annotation
[ https://issues.apache.org/jira/browse/SPARK-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253278#comment-15253278 ] holdenk commented on SPARK-14834: - Just to be clear this is adding a linter rule yes? > Force adding doc for new api in pyspark with @since annotation > -- > > Key: SPARK-14834 > URL: https://issues.apache.org/jira/browse/SPARK-14834 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14837) Add support in file stream source for reading new files added to subdirs
Tathagata Das created SPARK-14837: - Summary: Add support in file stream source for reading new files added to subdirs Key: SPARK-14837 URL: https://issues.apache.org/jira/browse/SPARK-14837 Project: Spark Issue Type: Sub-task Reporter: Tathagata Das Assignee: Tathagata Das -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14521) StackOverflowError in Kryo when executing TPC-DS
[ https://issues.apache.org/jira/browse/SPARK-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253264#comment-15253264 ] Apache Spark commented on SPARK-14521: -- User 'yzhou2001' has created a pull request for this issue: https://github.com/apache/spark/pull/12598 > StackOverflowError in Kryo when executing TPC-DS > > > Key: SPARK-14521 > URL: https://issues.apache.org/jira/browse/SPARK-14521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Rajesh Balamohan >Priority: Blocker > > Build details: Spark build from master branch (Apr-10) > DataSet:TPC-DS at 200 GB scale in Parq format stored in hive. > Client: $SPARK_HOME/bin/beeline > Query: TPC-DS Query27 > spark.sql.sources.fileScan=true (this is the default value anyways) > Exception: > {noformat} > Exception in thread "broadcast-exchange-0" java.lang.StackOverflowError > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:108) > at > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:99) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > at > com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80) > at > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518) > at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100) > at > com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40) > at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14836) Zip local jars before uploading to distributed cache
[ https://issues.apache.org/jira/browse/SPARK-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253254#comment-15253254 ] Apache Spark commented on SPARK-14836: -- User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/12597 > Zip local jars before uploading to distributed cache > > > Key: SPARK-14836 > URL: https://issues.apache.org/jira/browse/SPARK-14836 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.0.0 >Reporter: Saisai Shao >Priority: Minor > > Currently if neither {{spark.yarn.jars}} nor {{spark.yarn.archive}} is set > (by default), Spark on yarn code will upload all the jars in the folder > separately into distributed cache, this is quite time consuming, and very > verbose, instead of upload jars separately into distributed cache, here > changes to zip all the jars first, and then put into distributed cache. > This will significantly improve the speed of starting time, in my local > machine, it could save around 5 seconds for the starting period, not to say a > real cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14836) Zip local jars before uploading to distributed cache
[ https://issues.apache.org/jira/browse/SPARK-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14836: Assignee: (was: Apache Spark) > Zip local jars before uploading to distributed cache > > > Key: SPARK-14836 > URL: https://issues.apache.org/jira/browse/SPARK-14836 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.0.0 >Reporter: Saisai Shao >Priority: Minor > > Currently if neither {{spark.yarn.jars}} nor {{spark.yarn.archive}} is set > (by default), Spark on yarn code will upload all the jars in the folder > separately into distributed cache, this is quite time consuming, and very > verbose, instead of upload jars separately into distributed cache, here > changes to zip all the jars first, and then put into distributed cache. > This will significantly improve the speed of starting time, in my local > machine, it could save around 5 seconds for the starting period, not to say a > real cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14836) Zip local jars before uploading to distributed cache
[ https://issues.apache.org/jira/browse/SPARK-14836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14836: Assignee: Apache Spark > Zip local jars before uploading to distributed cache > > > Key: SPARK-14836 > URL: https://issues.apache.org/jira/browse/SPARK-14836 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.0.0 >Reporter: Saisai Shao >Assignee: Apache Spark >Priority: Minor > > Currently if neither {{spark.yarn.jars}} nor {{spark.yarn.archive}} is set > (by default), Spark on yarn code will upload all the jars in the folder > separately into distributed cache, this is quite time consuming, and very > verbose, instead of upload jars separately into distributed cache, here > changes to zip all the jars first, and then put into distributed cache. > This will significantly improve the speed of starting time, in my local > machine, it could save around 5 seconds for the starting period, not to say a > real cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14829) Deprecate GLM APIs using SGD
[ https://issues.apache.org/jira/browse/SPARK-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253251#comment-15253251 ] Apache Spark commented on SPARK-14829: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/12596 > Deprecate GLM APIs using SGD > > > Key: SPARK-14829 > URL: https://issues.apache.org/jira/browse/SPARK-14829 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley > > I don't know how many times I have heard someone run into issues with > LinearRegression or LogisticRegression, only to find that it is because they > are using the SGD implementations in spark.mllib. We should deprecate these > SGD APIs in 2.0 to encourage users to use LBFGS and the spark.ml > implementations, which are significantly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14829) Deprecate GLM APIs using SGD
[ https://issues.apache.org/jira/browse/SPARK-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14829: Assignee: (was: Apache Spark) > Deprecate GLM APIs using SGD > > > Key: SPARK-14829 > URL: https://issues.apache.org/jira/browse/SPARK-14829 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley > > I don't know how many times I have heard someone run into issues with > LinearRegression or LogisticRegression, only to find that it is because they > are using the SGD implementations in spark.mllib. We should deprecate these > SGD APIs in 2.0 to encourage users to use LBFGS and the spark.ml > implementations, which are significantly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11227) Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
[ https://issues.apache.org/jira/browse/SPARK-11227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253250#comment-15253250 ] valgrind_girl commented on SPARK-11227: --- we run into the same problem at spark 1.6.1(we are using sparkContext.textfile)。and it only occurs at spark-submit,while the same codes work fine at spark-shell。 > Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1 > > > Key: SPARK-11227 > URL: https://issues.apache.org/jira/browse/SPARK-11227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0, 1.5.1 > Environment: OS: CentOS 6.6 > Memory: 28G > CPU: 8 > Mesos: 0.22.0 > HDFS: Hadoop 2.6.0-CDH5.4.0 (build by Cloudera Manager) >Reporter: Yuri Saito > > When running jar including Spark Job at HDFS HA Cluster, Mesos and > Spark1.5.1, the job throw Exception as "java.net.UnknownHostException: > nameservice1" and fail. > I do below in Terminal. > {code} > /opt/spark/bin/spark-submit \ > --class com.example.Job /jobs/job-assembly-1.0.0.jar > {code} > So, job throw below message. > {code} > 15/10/21 15:22:12 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 > (TID 0, spark003.example.com): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:665) > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:601) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) > at > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:436) > at > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:409) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:1016) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) > at scala.Option.map(Option.scala:145) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.UnknownHostException: nameservice1 > ... 41 more > {code} > But, I changed from Spark Cluster 1.5.1 to Spark Cluster 1.4.0, then run the >
[jira] [Assigned] (SPARK-14829) Deprecate GLM APIs using SGD
[ https://issues.apache.org/jira/browse/SPARK-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14829: Assignee: Apache Spark > Deprecate GLM APIs using SGD > > > Key: SPARK-14829 > URL: https://issues.apache.org/jira/browse/SPARK-14829 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > I don't know how many times I have heard someone run into issues with > LinearRegression or LogisticRegression, only to find that it is because they > are using the SGD implementations in spark.mllib. We should deprecate these > SGD APIs in 2.0 to encourage users to use LBFGS and the spark.ml > implementations, which are significantly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14829) Deprecate GLM APIs using SGD
[ https://issues.apache.org/jira/browse/SPARK-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253245#comment-15253245 ] zhengruifeng commented on SPARK-14829: -- [~josephkb] I am working on this. > Deprecate GLM APIs using SGD > > > Key: SPARK-14829 > URL: https://issues.apache.org/jira/browse/SPARK-14829 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Joseph K. Bradley > > I don't know how many times I have heard someone run into issues with > LinearRegression or LogisticRegression, only to find that it is because they > are using the SGD implementations in spark.mllib. We should deprecate these > SGD APIs in 2.0 to encourage users to use LBFGS and the spark.ml > implementations, which are significantly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14750) Make historyServer refer application log in hdfs
[ https://issues.apache.org/jira/browse/SPARK-14750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253237#comment-15253237 ] SuYan commented on SPARK-14750: --- yarn.log-aggregation-enable true // if this =true, means application log in container will recycle in aggregated hdfs folder. yarn.log-aggregation.retain-seconds 259200 // this means how long the application will retained in aggregated hdfs folder. yarn.nodemanager.delete.debug-delay-sec 600 // this means, for some debug requirement, how long application log retained in nodemanager folder even if the application logs had recycled in aggregated folder. yarn.nodemanager.log.retain-seconds 86400 // this property is for log-aggregate = false. how long will the log retained in the nodemanager folder. eh...I found some improvements point for that PR, I need consider aggregate = false, while the yarn.nodemanager.log.retain-seconds = some large value. > Make historyServer refer application log in hdfs > > > Key: SPARK-14750 > URL: https://issues.apache.org/jira/browse/SPARK-14750 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.6.1 >Reporter: SuYan > > Make history server refer application log, just like MR history server -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14804) Graph vertexRDD/EdgeRDD checkpoint results ClassCastException:
[ https://issues.apache.org/jira/browse/SPARK-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14804: Assignee: (was: Apache Spark) > Graph vertexRDD/EdgeRDD checkpoint results ClassCastException: > --- > > Key: SPARK-14804 > URL: https://issues.apache.org/jira/browse/SPARK-14804 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.6.1 >Reporter: SuYan >Priority: Minor > > {code} > graph3.vertices.checkpoint() > graph3.vertices.count() > graph3.vertices.map(_._2).count() > {code} > 16/04/21 21:04:43 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 > (TID 13, localhost): java.lang.ClassCastException: > org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast to > scala.Tuple2 > at > com.xiaomi.infra.codelab.spark.Graph2$$anonfun$main$1.apply(Graph2.scala:80) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1597) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1161) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1161) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1863) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1863) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:91) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:219) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > look at the code: > {code} > private[spark] def computeOrReadCheckpoint(split: Partition, context: > TaskContext): Iterator[T] = > { > if (isCheckpointedAndMaterialized) { > firstParent[T].iterator(split, context) > } else { > compute(split, context) > } > } > private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed > override def isCheckpointed: Boolean = { >firstParent[(PartitionID, EdgePartition[ED, VD])].isCheckpointed > } > {code} > for VertexRDD or EdgeRDD, first parent is its partitionRDD > RDD[ShippableVertexPartition[VD]]/RDD[(PartitionID, EdgePartition[ED, VD])] > 1. we call vertexRDD.checkpoint, it partitionRDD will checkpoint, so > VertexRDD.isCheckpointedAndMaterialized=true. > 2. then we call vertexRDD.iterator, because checkoint=true it called > firstParent.iterator(which is not CheckpointRDD, actually is partitionRDD). > > so returned iterator is iterator[ShippableVertexPartition] not expect > iterator[(VertexId, VD)]] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14804) Graph vertexRDD/EdgeRDD checkpoint results ClassCastException:
[ https://issues.apache.org/jira/browse/SPARK-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253230#comment-15253230 ] Apache Spark commented on SPARK-14804: -- User 'suyanNone' has created a pull request for this issue: https://github.com/apache/spark/pull/12576 > Graph vertexRDD/EdgeRDD checkpoint results ClassCastException: > --- > > Key: SPARK-14804 > URL: https://issues.apache.org/jira/browse/SPARK-14804 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.6.1 >Reporter: SuYan >Priority: Minor > > {code} > graph3.vertices.checkpoint() > graph3.vertices.count() > graph3.vertices.map(_._2).count() > {code} > 16/04/21 21:04:43 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 > (TID 13, localhost): java.lang.ClassCastException: > org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast to > scala.Tuple2 > at > com.xiaomi.infra.codelab.spark.Graph2$$anonfun$main$1.apply(Graph2.scala:80) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1597) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1161) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1161) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1863) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1863) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:91) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:219) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > look at the code: > {code} > private[spark] def computeOrReadCheckpoint(split: Partition, context: > TaskContext): Iterator[T] = > { > if (isCheckpointedAndMaterialized) { > firstParent[T].iterator(split, context) > } else { > compute(split, context) > } > } > private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed > override def isCheckpointed: Boolean = { >firstParent[(PartitionID, EdgePartition[ED, VD])].isCheckpointed > } > {code} > for VertexRDD or EdgeRDD, first parent is its partitionRDD > RDD[ShippableVertexPartition[VD]]/RDD[(PartitionID, EdgePartition[ED, VD])] > 1. we call vertexRDD.checkpoint, it partitionRDD will checkpoint, so > VertexRDD.isCheckpointedAndMaterialized=true. > 2. then we call vertexRDD.iterator, because checkoint=true it called > firstParent.iterator(which is not CheckpointRDD, actually is partitionRDD). > > so returned iterator is iterator[ShippableVertexPartition] not expect > iterator[(VertexId, VD)]] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14804) Graph vertexRDD/EdgeRDD checkpoint results ClassCastException:
[ https://issues.apache.org/jira/browse/SPARK-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14804: Assignee: Apache Spark > Graph vertexRDD/EdgeRDD checkpoint results ClassCastException: > --- > > Key: SPARK-14804 > URL: https://issues.apache.org/jira/browse/SPARK-14804 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.6.1 >Reporter: SuYan >Assignee: Apache Spark >Priority: Minor > > {code} > graph3.vertices.checkpoint() > graph3.vertices.count() > graph3.vertices.map(_._2).count() > {code} > 16/04/21 21:04:43 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 > (TID 13, localhost): java.lang.ClassCastException: > org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast to > scala.Tuple2 > at > com.xiaomi.infra.codelab.spark.Graph2$$anonfun$main$1.apply(Graph2.scala:80) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1597) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1161) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1161) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1863) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1863) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:91) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:219) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > look at the code: > {code} > private[spark] def computeOrReadCheckpoint(split: Partition, context: > TaskContext): Iterator[T] = > { > if (isCheckpointedAndMaterialized) { > firstParent[T].iterator(split, context) > } else { > compute(split, context) > } > } > private[spark] def isCheckpointedAndMaterialized: Boolean = isCheckpointed > override def isCheckpointed: Boolean = { >firstParent[(PartitionID, EdgePartition[ED, VD])].isCheckpointed > } > {code} > for VertexRDD or EdgeRDD, first parent is its partitionRDD > RDD[ShippableVertexPartition[VD]]/RDD[(PartitionID, EdgePartition[ED, VD])] > 1. we call vertexRDD.checkpoint, it partitionRDD will checkpoint, so > VertexRDD.isCheckpointedAndMaterialized=true. > 2. then we call vertexRDD.iterator, because checkoint=true it called > firstParent.iterator(which is not CheckpointRDD, actually is partitionRDD). > > so returned iterator is iterator[ShippableVertexPartition] not expect > iterator[(VertexId, VD)]] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14594) Improve error messages for RDD API
[ https://issues.apache.org/jira/browse/SPARK-14594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253229#comment-15253229 ] Felix Cheung commented on SPARK-14594: -- Not sure if this was specific to the data types of "the_table", but yea, it works if I try` {code} > df <- createDataFrame(sqlContext, iris) > rdd<-SparkR:::toRDD(df) > gb<-SparkR:::groupByKey(rdd, 1000) > first(gb) [[1]] [1] 4.3 [[2]] [[2]][[1]] [1] 3 {code} perhaps try {code} t <- table(sqlContext, 'the_table') printSchema(t) {code} And see what it looks like? Also, is "the_table" from the hive context? > Improve error messages for RDD API > -- > > Key: SPARK-14594 > URL: https://issues.apache.org/jira/browse/SPARK-14594 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 1.5.2 >Reporter: Marco Gaido > > When you have an error in your R code using the RDD API, you always get as > error message: > Error in if (returnStatus != 0) { : argument is of length zero > This is not very useful and I think it might be better to catch the R > exception and show it instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14314) K-means model persistence in SparkR
[ https://issues.apache.org/jira/browse/SPARK-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253227#comment-15253227 ] Gayathri Murali commented on SPARK-14314: - [~mengxr] Yes. > K-means model persistence in SparkR > --- > > Key: SPARK-14314 > URL: https://issues.apache.org/jira/browse/SPARK-14314 > Project: Spark > Issue Type: Sub-task > Components: ML, SparkR >Reporter: Xiangrui Meng > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14159) StringIndexerModel sets output column metadata incorrectly
[ https://issues.apache.org/jira/browse/SPARK-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253215#comment-15253215 ] Apache Spark commented on SPARK-14159: -- User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/12595 > StringIndexerModel sets output column metadata incorrectly > -- > > Key: SPARK-14159 > URL: https://issues.apache.org/jira/browse/SPARK-14159 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Minor > Fix For: 2.0.0 > > > StringIndexerModel.transform sets the output column metadata to use name > inputCol. It should not. Fixing this causes a problem with the metadata > produced by RFormula. > Fix in RFormula: I added the StringIndexer columns to prefixesToRewrite, and > I modified VectorAttributeRewriter to find and replace all "prefixes" since > attributes collect multiple prefixes from StringIndexer + Interaction. > Note that "prefixes" is no longer accurate since internal strings may be > replaced. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14753) remove internal flag in Accumulable
[ https://issues.apache.org/jira/browse/SPARK-14753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-14753: Issue Type: Sub-task (was: Improvement) Parent: SPARK-14626 > remove internal flag in Accumulable > --- > > Key: SPARK-14753 > URL: https://issues.apache.org/jira/browse/SPARK-14753 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14836) Zip local jars before uploading to distributed cache
Saisai Shao created SPARK-14836: --- Summary: Zip local jars before uploading to distributed cache Key: SPARK-14836 URL: https://issues.apache.org/jira/browse/SPARK-14836 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.0.0 Reporter: Saisai Shao Priority: Minor Currently if neither {{spark.yarn.jars}} nor {{spark.yarn.archive}} is set (by default), Spark on yarn code will upload all the jars in the folder separately into distributed cache, this is quite time consuming, and very verbose, instead of upload jars separately into distributed cache, here changes to zip all the jars first, and then put into distributed cache. This will significantly improve the speed of starting time, in my local machine, it could save around 5 seconds for the starting period, not to say a real cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14340) Add Scala Example and Description for ml.BisectingKMeans
[ https://issues.apache.org/jira/browse/SPARK-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253189#comment-15253189 ] zhengruifeng commented on SPARK-14340: -- Doument for BisectingKMeans > Add Scala Example and Description for ml.BisectingKMeans > > > Key: SPARK-14340 > URL: https://issues.apache.org/jira/browse/SPARK-14340 > Project: Spark > Issue Type: Improvement >Reporter: zhengruifeng >Priority: Minor > > 1, add BisectingKMeans to ml-clustering.md > 2, add the missing Scala BisectingKMeansExample -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9612) Add instance weight support for GBTs
[ https://issues.apache.org/jira/browse/SPARK-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253185#comment-15253185 ] Joseph K. Bradley commented on SPARK-9612: -- Removing target version, but please update as needed [~dbtsai] > Add instance weight support for GBTs > > > Key: SPARK-9612 > URL: https://issues.apache.org/jira/browse/SPARK-9612 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Assignee: DB Tsai >Priority: Minor > > GBT support for instance weights could be handled by: > * sampling data before passing it to trees > * passing weights to trees (requiring weight support for trees first, but > probably better in the end) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9612) Add instance weight support for GBTs
[ https://issues.apache.org/jira/browse/SPARK-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9612: - Target Version/s: (was: 2.0.0) > Add instance weight support for GBTs > > > Key: SPARK-9612 > URL: https://issues.apache.org/jira/browse/SPARK-9612 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Assignee: DB Tsai >Priority: Minor > > GBT support for instance weights could be handled by: > * sampling data before passing it to trees > * passing weights to trees (requiring weight support for trees first, but > probably better in the end) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10408) Autoencoder
[ https://issues.apache.org/jira/browse/SPARK-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253182#comment-15253182 ] Joseph K. Bradley commented on SPARK-10408: --- I'm going to remove the target version since this won't make 2.0. > Autoencoder > --- > > Key: SPARK-10408 > URL: https://issues.apache.org/jira/browse/SPARK-10408 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 1.5.0 >Reporter: Alexander Ulanov >Assignee: Alexander Ulanov >Priority: Minor > > Goal: Implement various types of autoencoders > Requirements: > 1)Basic (deep) autoencoder that supports different types of inputs: binary, > real in [0..1]. real in [-inf, +inf] > 2)Sparse autoencoder i.e. L1 regularization. It should be added as a feature > to the MLP and then used here > 3)Denoising autoencoder > 4)Stacked autoencoder for pre-training of deep networks. It should support > arbitrary network layers > References: > 1. Vincent, Pascal, et al. "Extracting and composing robust features with > denoising autoencoders." Proceedings of the 25th international conference on > Machine learning. ACM, 2008. > http://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf > > 2. > http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Rifai_455.pdf, > 3. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P.-A. > (2010). Stacked denoising autoencoders: Learning useful representations in a > deep network with a local denoising criterion. Journal of Machine Learning > Research, 11(3371–3408). > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.297.3484=rep1=pdf > 4, 5, 6. Bengio, Yoshua, et al. "Greedy layer-wise training of deep > networks." Advances in neural information processing systems 19 (2007): 153. > http://www.iro.umontreal.ca/~lisa/pointeurs/dbn_supervised_tr1282.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10408) Autoencoder
[ https://issues.apache.org/jira/browse/SPARK-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-10408: -- Target Version/s: (was: 2.0.0) > Autoencoder > --- > > Key: SPARK-10408 > URL: https://issues.apache.org/jira/browse/SPARK-10408 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 1.5.0 >Reporter: Alexander Ulanov >Assignee: Alexander Ulanov >Priority: Minor > > Goal: Implement various types of autoencoders > Requirements: > 1)Basic (deep) autoencoder that supports different types of inputs: binary, > real in [0..1]. real in [-inf, +inf] > 2)Sparse autoencoder i.e. L1 regularization. It should be added as a feature > to the MLP and then used here > 3)Denoising autoencoder > 4)Stacked autoencoder for pre-training of deep networks. It should support > arbitrary network layers > References: > 1. Vincent, Pascal, et al. "Extracting and composing robust features with > denoising autoencoders." Proceedings of the 25th international conference on > Machine learning. ACM, 2008. > http://www.iro.umontreal.ca/~vincentp/Publications/denoising_autoencoders_tr1316.pdf > > 2. > http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Rifai_455.pdf, > 3. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P.-A. > (2010). Stacked denoising autoencoders: Learning useful representations in a > deep network with a local denoising criterion. Journal of Machine Learning > Research, 11(3371–3408). > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.297.3484=rep1=pdf > 4, 5, 6. Bengio, Yoshua, et al. "Greedy layer-wise training of deep > networks." Advances in neural information processing systems 19 (2007): 153. > http://www.iro.umontreal.ca/~lisa/pointeurs/dbn_supervised_tr1282.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14835) Remove MetastoreRelation dependency from SQLBuilder
[ https://issues.apache.org/jira/browse/SPARK-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14835: Assignee: Reynold Xin (was: Apache Spark) > Remove MetastoreRelation dependency from SQLBuilder > --- > > Key: SPARK-14835 > URL: https://issues.apache.org/jira/browse/SPARK-14835 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14835) Remove MetastoreRelation dependency from SQLBuilder
[ https://issues.apache.org/jira/browse/SPARK-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253174#comment-15253174 ] Apache Spark commented on SPARK-14835: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/12594 > Remove MetastoreRelation dependency from SQLBuilder > --- > > Key: SPARK-14835 > URL: https://issues.apache.org/jira/browse/SPARK-14835 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14835) Remove MetastoreRelation dependency from SQLBuilder
[ https://issues.apache.org/jira/browse/SPARK-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14835: Assignee: Apache Spark (was: Reynold Xin) > Remove MetastoreRelation dependency from SQLBuilder > --- > > Key: SPARK-14835 > URL: https://issues.apache.org/jira/browse/SPARK-14835 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14835) Remove MetastoreRelation dependency from SQLBuilder
Reynold Xin created SPARK-14835: --- Summary: Remove MetastoreRelation dependency from SQLBuilder Key: SPARK-14835 URL: https://issues.apache.org/jira/browse/SPARK-14835 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14732) spark.ml GaussianMixture should not use spark.mllib MultivariateGaussian
[ https://issues.apache.org/jira/browse/SPARK-14732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253168#comment-15253168 ] Apache Spark commented on SPARK-14732: -- User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/12593 > spark.ml GaussianMixture should not use spark.mllib MultivariateGaussian > > > Key: SPARK-14732 > URL: https://issues.apache.org/jira/browse/SPARK-14732 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > {{org.apache.spark.ml.clustering.GaussianMixtureModel.gaussians}} currently > returns the {{MultivariateGaussian}} type from spark.mllib. We should copy > the MultivariateGaussian class into spark.ml to avoid referencing spark.mllib > types publicly. > I'll put it in mllib-local under > {{spark.ml.stat.distribution.MultivariateGaussian}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14732) spark.ml GaussianMixture should not use spark.mllib MultivariateGaussian
[ https://issues.apache.org/jira/browse/SPARK-14732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14732: Assignee: Apache Spark (was: Joseph K. Bradley) > spark.ml GaussianMixture should not use spark.mllib MultivariateGaussian > > > Key: SPARK-14732 > URL: https://issues.apache.org/jira/browse/SPARK-14732 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Apache Spark > > {{org.apache.spark.ml.clustering.GaussianMixtureModel.gaussians}} currently > returns the {{MultivariateGaussian}} type from spark.mllib. We should copy > the MultivariateGaussian class into spark.ml to avoid referencing spark.mllib > types publicly. > I'll put it in mllib-local under > {{spark.ml.stat.distribution.MultivariateGaussian}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14732) spark.ml GaussianMixture should not use spark.mllib MultivariateGaussian
[ https://issues.apache.org/jira/browse/SPARK-14732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14732: Assignee: Joseph K. Bradley (was: Apache Spark) > spark.ml GaussianMixture should not use spark.mllib MultivariateGaussian > > > Key: SPARK-14732 > URL: https://issues.apache.org/jira/browse/SPARK-14732 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > > {{org.apache.spark.ml.clustering.GaussianMixtureModel.gaussians}} currently > returns the {{MultivariateGaussian}} type from spark.mllib. We should copy > the MultivariateGaussian class into spark.ml to avoid referencing spark.mllib > types publicly. > I'll put it in mllib-local under > {{spark.ml.stat.distribution.MultivariateGaussian}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14834) Force adding doc for new api in pyspark with @since annotation
[ https://issues.apache.org/jira/browse/SPARK-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated SPARK-14834: --- Summary: Force adding doc for new api in pyspark with @since annotation (was: Require doc for new api in pyspark with @since annotation) > Force adding doc for new api in pyspark with @since annotation > -- > > Key: SPARK-14834 > URL: https://issues.apache.org/jira/browse/SPARK-14834 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Jeff Zhang > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14766) Attribute reference mismatch with Dataset filter + mapPartitions
[ https://issues.apache.org/jira/browse/SPARK-14766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253165#comment-15253165 ] Wenchen Fan commented on SPARK-14766: - Hi [~brkyvz], can you verify if this bug still exists? I can't reproduce it on master. > Attribute reference mismatch with Dataset filter + mapPartitions > > > Key: SPARK-14766 > URL: https://issues.apache.org/jira/browse/SPARK-14766 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Burak Yavuz > > After a filter, the Dataset references seem to be not copied properly leading > to an exception. To reproduce, you may use the following code: > {code} > Seq((1, 1)).toDS().filter(_._1 != 0).mapPartitions { iter => iter }.count() > {code} > Using explain shows the problem: > {code} > == Physical Plan == > !MapPartitions , newInstance(class scala.Tuple2), [input[0, > scala.Tuple2]._1 AS _1#38521,input[0, scala.Tuple2]._2 AS _2#38522] > +- WholeStageCodegen >: +- Filter .apply >: +- INPUT >+- LocalTableScan [_1#38512,_2#38513], [[0,1,1]] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9961) ML prediction abstractions should have defaultEvaluator fields
[ https://issues.apache.org/jira/browse/SPARK-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9961: - Target Version/s: (was: 2.0.0) > ML prediction abstractions should have defaultEvaluator fields > -- > > Key: SPARK-9961 > URL: https://issues.apache.org/jira/browse/SPARK-9961 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley > > Predictor and PredictionModel should have abstract defaultEvaluator methods > which return Evaluators. Subclasses like Regressor, Classifier, etc. should > all provide natural evaluators, set to use the correct input columns and > metrics. Concrete classes may later be modified to use other evaluators or > evaluator options. > The initial implementation should be marked as DeveloperApi since we may need > to change the defaults later on. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11559) Make `runs` no effect in k-means
[ https://issues.apache.org/jira/browse/SPARK-11559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253162#comment-15253162 ] Joseph K. Bradley commented on SPARK-11559: --- [~yanboliang] Let's separate out this issue from your current large PR. Could you please send a separate PR to disable "runs?" Thanks! > Make `runs` no effect in k-means > > > Key: SPARK-11559 > URL: https://issues.apache.org/jira/browse/SPARK-11559 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang > > We deprecated `runs` in Spark 1.6 (SPARK-11358). In 2.0, we can either remove > `runs` or make it no effect (with warning messages). So we can simplify the > implementation. I prefer the latter for better binary compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13448) Document MLlib behavior changes in Spark 2.0
[ https://issues.apache.org/jira/browse/SPARK-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-13448: -- Description: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to add them to the migration guide / release notes. * SPARK-13429: change convergenceTol in LogisticRegressionWithLBFGS from 1e-4 to 1e-6. * SPARK-7780: Intercept will not be regularized if users train binary classification model with L1/L2 Updater by LogisticRegressionWithLBFGS, because it calls ML LogisticRegresson implementation. Meanwhile if users set without regularization, training with or without feature scaling will return the same solution by the same convergence rate(because they run the same code route), this behavior is different from the old API. * SPARK-12363: Bug fix for PowerIterationClustering which will likely change results * SPARK-13048: LDA using the EM optimizer will keep the last checkpoint by default, if checkpointing is being used. * SPARK-12153: Word2Vec now respects sentence boundaries. Previously, it did not handle them correctly. * SPARK-10574: HashingTF uses MurmurHash3 by default in both spark.ml and spark.mllib * SPARK-14768: Remove expectedType arg for PySpark Param was: This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can remember to add them to the migration guide / release notes. * SPARK-13429: change convergenceTol in LogisticRegressionWithLBFGS from 1e-4 to 1e-6. * SPARK-7780: Intercept will not be regularized if users train binary classification model with L1/L2 Updater by LogisticRegressionWithLBFGS, because it calls ML LogisticRegresson implementation. Meanwhile if users set without regularization, training with or without feature scaling will return the same solution by the same convergence rate(because they run the same code route), this behavior is different from the old API. * SPARK-12363: Bug fix for PowerIterationClustering which will likely change results * SPARK-13048: LDA using the EM optimizer will keep the last checkpoint by default, if checkpointing is being used. * SPARK-12153: Word2Vec now respects sentence boundaries. Previously, it did not handle them correctly. * SPARK-10574: HashingTF uses MurmurHash3 by default in both spark.ml and spark.mllib * SPARK-14768: Remove expectedType arg for PySpark Param ** (*pending further discussion*) > Document MLlib behavior changes in Spark 2.0 > > > Key: SPARK-13448 > URL: https://issues.apache.org/jira/browse/SPARK-13448 > Project: Spark > Issue Type: Documentation > Components: ML, MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > > This JIRA keeps a list of MLlib behavior changes in Spark 2.0. So we can > remember to add them to the migration guide / release notes. > * SPARK-13429: change convergenceTol in LogisticRegressionWithLBFGS from 1e-4 > to 1e-6. > * SPARK-7780: Intercept will not be regularized if users train binary > classification model with L1/L2 Updater by LogisticRegressionWithLBFGS, > because it calls ML LogisticRegresson implementation. Meanwhile if users set > without regularization, training with or without feature scaling will return > the same solution by the same convergence rate(because they run the same code > route), this behavior is different from the old API. > * SPARK-12363: Bug fix for PowerIterationClustering which will likely change > results > * SPARK-13048: LDA using the EM optimizer will keep the last checkpoint by > default, if checkpointing is being used. > * SPARK-12153: Word2Vec now respects sentence boundaries. Previously, it did > not handle them correctly. > * SPARK-10574: HashingTF uses MurmurHash3 by default in both spark.ml and > spark.mllib > * SPARK-14768: Remove expectedType arg for PySpark Param -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14834) Require doc for new api in pyspark with @since annotation
Jeff Zhang created SPARK-14834: -- Summary: Require doc for new api in pyspark with @since annotation Key: SPARK-14834 URL: https://issues.apache.org/jira/browse/SPARK-14834 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Jeff Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14314) K-means model persistence in SparkR
[ https://issues.apache.org/jira/browse/SPARK-14314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253140#comment-15253140 ] Xiangrui Meng commented on SPARK-14314: --- Please hold until the naive Bayes one gets merged. On Thu, Apr 21, 2016, 10:19 AM Gayathri Murali (JIRA)> K-means model persistence in SparkR > --- > > Key: SPARK-14314 > URL: https://issues.apache.org/jira/browse/SPARK-14314 > Project: Spark > Issue Type: Sub-task > Components: ML, SparkR >Reporter: Xiangrui Meng > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13330) PYTHONHASHSEED is not propgated to python worker
[ https://issues.apache.org/jira/browse/SPARK-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated SPARK-13330: --- Summary: PYTHONHASHSEED is not propgated to python worker (was: PYTHONHASHSEED is not propgated to executor) > PYTHONHASHSEED is not propgated to python worker > > > Key: SPARK-13330 > URL: https://issues.apache.org/jira/browse/SPARK-13330 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0 >Reporter: Jeff Zhang > > when using python 3.3 , PYTHONHASHSEED is only set in driver, but not > propagated to executor, and cause the following error. > {noformat} > File "/Users/jzhang/github/spark/python/pyspark/rdd.py", line 74, in > portable_hash > raise Exception("Randomness of hash of string should be disabled via > PYTHONHASHSEED") > Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED > at > org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) > at > org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:207) > at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) > at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:342) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:77) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:45) > at org.apache.spark.scheduler.Task.run(Task.scala:81) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14833) Refactor StreamTests to test for source fault-tolerance correctly.
[ https://issues.apache.org/jira/browse/SPARK-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253120#comment-15253120 ] Apache Spark commented on SPARK-14833: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/12592 > Refactor StreamTests to test for source fault-tolerance correctly. > -- > > Key: SPARK-14833 > URL: https://issues.apache.org/jira/browse/SPARK-14833 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Current StreamTest allows testing of a streaming Dataset generated explicitly > wraps a source. This is different from the actual production code path where > the source object is dynamically created through a DataSource object every > time a query is started. So all the fault-tolerance testing in > FileSourceSuite and FileSourceStressSuite is not really testing the actual > code path as they are just reusing the FileStreamSource object. > Instead of maintaining a mapping of source --> expected offset in StreamTest > (which requires reuse of source object), it should maintain a mapping of > source index --> offset, so that it is independent of the source object. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14833) Refactor StreamTests to test for source fault-tolerance correctly.
[ https://issues.apache.org/jira/browse/SPARK-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14833: Assignee: Apache Spark (was: Tathagata Das) > Refactor StreamTests to test for source fault-tolerance correctly. > -- > > Key: SPARK-14833 > URL: https://issues.apache.org/jira/browse/SPARK-14833 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Apache Spark > > Current StreamTest allows testing of a streaming Dataset generated explicitly > wraps a source. This is different from the actual production code path where > the source object is dynamically created through a DataSource object every > time a query is started. So all the fault-tolerance testing in > FileSourceSuite and FileSourceStressSuite is not really testing the actual > code path as they are just reusing the FileStreamSource object. > Instead of maintaining a mapping of source --> expected offset in StreamTest > (which requires reuse of source object), it should maintain a mapping of > source index --> offset, so that it is independent of the source object. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14833) Refactor StreamTests to test for source fault-tolerance correctly.
[ https://issues.apache.org/jira/browse/SPARK-14833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14833: Assignee: Tathagata Das (was: Apache Spark) > Refactor StreamTests to test for source fault-tolerance correctly. > -- > > Key: SPARK-14833 > URL: https://issues.apache.org/jira/browse/SPARK-14833 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > Current StreamTest allows testing of a streaming Dataset generated explicitly > wraps a source. This is different from the actual production code path where > the source object is dynamically created through a DataSource object every > time a query is started. So all the fault-tolerance testing in > FileSourceSuite and FileSourceStressSuite is not really testing the actual > code path as they are just reusing the FileStreamSource object. > Instead of maintaining a mapping of source --> expected offset in StreamTest > (which requires reuse of source object), it should maintain a mapping of > source index --> offset, so that it is independent of the source object. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14833) Refactor StreamTests to test for source fault-tolerance correctly.
Tathagata Das created SPARK-14833: - Summary: Refactor StreamTests to test for source fault-tolerance correctly. Key: SPARK-14833 URL: https://issues.apache.org/jira/browse/SPARK-14833 Project: Spark Issue Type: Sub-task Reporter: Tathagata Das Assignee: Tathagata Das Current StreamTest allows testing of a streaming Dataset generated explicitly wraps a source. This is different from the actual production code path where the source object is dynamically created through a DataSource object every time a query is started. So all the fault-tolerance testing in FileSourceSuite and FileSourceStressSuite is not really testing the actual code path as they are just reusing the FileStreamSource object. Instead of maintaining a mapping of source --> expected offset in StreamTest (which requires reuse of source object), it should maintain a mapping of source index --> offset, so that it is independent of the source object. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14824) Rename object HiveContext to something else
[ https://issues.apache.org/jira/browse/SPARK-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14824. - Resolution: Fixed Fix Version/s: 2/ > Rename object HiveContext to something else > --- > > Key: SPARK-14824 > URL: https://issues.apache.org/jira/browse/SPARK-14824 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2/ > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14824) Rename object HiveContext to something else
[ https://issues.apache.org/jira/browse/SPARK-14824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-14824: Fix Version/s: (was: 2/) 2.0.0 > Rename object HiveContext to something else > --- > > Key: SPARK-14824 > URL: https://issues.apache.org/jira/browse/SPARK-14824 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14832) Refactor DataSource to ensure schema is inferred only once when creating a file stream
[ https://issues.apache.org/jira/browse/SPARK-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14832: Assignee: Apache Spark (was: Tathagata Das) > Refactor DataSource to ensure schema is inferred only once when creating a > file stream > -- > > Key: SPARK-14832 > URL: https://issues.apache.org/jira/browse/SPARK-14832 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Apache Spark > > When creating a file stream using sqlContext.write.stream(), existing files > are scanned twice for finding the schema > - Once, when creating a DataSource + StreamingRelation in the > DataFrameReader.stream() > - Again, when creating streaming Source from the DataSource, in > DataSource.createSource() > Instead, the schema should be generated only once, at the time of creating > the dataframe, and when the streaming source is created, it should just reuse > that schema -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14832) Refactor DataSource to ensure schema is inferred only once when creating a file stream
[ https://issues.apache.org/jira/browse/SPARK-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14832: Assignee: Tathagata Das (was: Apache Spark) > Refactor DataSource to ensure schema is inferred only once when creating a > file stream > -- > > Key: SPARK-14832 > URL: https://issues.apache.org/jira/browse/SPARK-14832 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > When creating a file stream using sqlContext.write.stream(), existing files > are scanned twice for finding the schema > - Once, when creating a DataSource + StreamingRelation in the > DataFrameReader.stream() > - Again, when creating streaming Source from the DataSource, in > DataSource.createSource() > Instead, the schema should be generated only once, at the time of creating > the dataframe, and when the streaming source is created, it should just reuse > that schema -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14832) Refactor DataSource to ensure schema is inferred only once when creating a file stream
[ https://issues.apache.org/jira/browse/SPARK-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253107#comment-15253107 ] Apache Spark commented on SPARK-14832: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/12591 > Refactor DataSource to ensure schema is inferred only once when creating a > file stream > -- > > Key: SPARK-14832 > URL: https://issues.apache.org/jira/browse/SPARK-14832 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > When creating a file stream using sqlContext.write.stream(), existing files > are scanned twice for finding the schema > - Once, when creating a DataSource + StreamingRelation in the > DataFrameReader.stream() > - Again, when creating streaming Source from the DataSource, in > DataSource.createSource() > Instead, the schema should be generated only once, at the time of creating > the dataframe, and when the streaming source is created, it should just reuse > that schema -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14832) Refactor DataSource to ensure schema is inferred only once when creating a file stream
[ https://issues.apache.org/jira/browse/SPARK-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-14832: -- Description: When creating a file stream using sqlContext.write.stream(), existing files are scanned twice for finding the schema - Once, when creating a DataSource + StreamingRelation in the DataFrameReader.stream() - Again, when creating streaming Source from the DataSource, in DataSource.createSource() Instead, the schema should be generated only once, at the time of creating the dataframe, and when the streaming source is created, it should just reuse that schema was: When creating a file stream using sqlContext.write.stream(), existing files are scanned twice for finding the schema - Once, when creating a DataSource + StreamingRelation in the DataFrameReader.stream() - Again, when creating streaming Source from the DataSource, in DataSource.createSource() Instead, the schema should be generated only once, at the time of creating the dataframe, and when the streaming source is created, it should just reuse that schame > Refactor DataSource to ensure schema is inferred only once when creating a > file stream > -- > > Key: SPARK-14832 > URL: https://issues.apache.org/jira/browse/SPARK-14832 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > > When creating a file stream using sqlContext.write.stream(), existing files > are scanned twice for finding the schema > - Once, when creating a DataSource + StreamingRelation in the > DataFrameReader.stream() > - Again, when creating streaming Source from the DataSource, in > DataSource.createSource() > Instead, the schema should be generated only once, at the time of creating > the dataframe, and when the streaming source is created, it should just reuse > that schema -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14831) Make ML APIs in SparkR consistent
[ https://issues.apache.org/jira/browse/SPARK-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-14831: -- Description: In current master, we have 4 ML methods in SparkR: {code:none} glm(formula, family, data, ...) kmeans(data, centers, ...) naiveBayes(formula, data, ...) survreg(formula, data, ...) {code} We tried to keep the signatures similar to existing ones in R. However, if we put them together, they are not consistent. One example is k-means, which doesn't accept a formula. Instead of looking at each method independently, we might want to update the signature of kmeans to {code:none} kmeans(formula, data, centers, ...) {code} We can also discuss possible global changes here. For example, `glm` puts `family` before `data` while `kmeans` puts `centers` after `data`. This is not consistent. And logically, the formula doesn't mean anything without associating with a DataFrame. So it makes more sense to me to have the following signature: {code:none} algorithm(df, formula, [required params], [optional params]) {code} If we make this change, we might want to avoid name collisions because they have different signature. We can use `ml.kmeans`, 'ml.glm`, etc. Sorry for discussing API changes in the last minute. But I think it would be better to have consistent signatures in SparkR. cc: [~shivaram] [~josephkb] [~yanboliang] was: In current master, we have 4 ML methods in SparkR: {code:none} glm(formula, family, data, ...) kmeans(data, centers, ...) naiveBayes(formula, data, ...) survreg(formula, data, ...) {code} We tried to keep the signatures similar to existing ones in R. However, if we put them together, they are not consistent. One example is k-means, which doesn't accept a formula. Instead of looking at each method independently, we might want to update the signature of kmeans to {code:none} kmeans(formula, data, centers, ...) {code} We can also discuss possible global changes here. For example, `glm` puts `family` before `data` while `kmeans` puts `centers` after `data`. This is not consistent. And logically, the formula doesn't mean anything without associating with a DataFrame. So it makes more sense to me to have the following signature: {code:none} algorithm(data, formula, [required params], [optional params]) {code} If we make this change, we might want to avoid name collisions because they have different signature. We can use `ml.kmeans`, 'ml.glm`, etc. Sorry for discussing API changes in the last minute. But I think it would be better to have consistent signatures in SparkR. cc: [~shivaram] [~josephkb] [~yanboliang] > Make ML APIs in SparkR consistent > - > > Key: SPARK-14831 > URL: https://issues.apache.org/jira/browse/SPARK-14831 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Critical > > In current master, we have 4 ML methods in SparkR: > {code:none} > glm(formula, family, data, ...) > kmeans(data, centers, ...) > naiveBayes(formula, data, ...) > survreg(formula, data, ...) > {code} > We tried to keep the signatures similar to existing ones in R. However, if we > put them together, they are not consistent. One example is k-means, which > doesn't accept a formula. Instead of looking at each method independently, we > might want to update the signature of kmeans to > {code:none} > kmeans(formula, data, centers, ...) > {code} > We can also discuss possible global changes here. For example, `glm` puts > `family` before `data` while `kmeans` puts `centers` after `data`. This is > not consistent. And logically, the formula doesn't mean anything without > associating with a DataFrame. So it makes more sense to me to have the > following signature: > {code:none} > algorithm(df, formula, [required params], [optional params]) > {code} > If we make this change, we might want to avoid name collisions because they > have different signature. We can use `ml.kmeans`, 'ml.glm`, etc. > Sorry for discussing API changes in the last minute. But I think it would be > better to have consistent signatures in SparkR. > cc: [~shivaram] [~josephkb] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14832) Refactor DataSource to ensure schema is inferred only once when creating a file stream
Tathagata Das created SPARK-14832: - Summary: Refactor DataSource to ensure schema is inferred only once when creating a file stream Key: SPARK-14832 URL: https://issues.apache.org/jira/browse/SPARK-14832 Project: Spark Issue Type: Sub-task Components: SQL, Streaming Reporter: Tathagata Das Assignee: Tathagata Das When creating a file stream using sqlContext.write.stream(), existing files are scanned twice for finding the schema - Once, when creating a DataSource + StreamingRelation in the DataFrameReader.stream() - Again, when creating streaming Source from the DataSource, in DataSource.createSource() Instead, the schema should be generated only once, at the time of creating the dataframe, and when the streaming source is created, it should just reuse that schame -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14831) Make ML APIs in SparkR consistent
[ https://issues.apache.org/jira/browse/SPARK-14831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-14831: -- Description: In current master, we have 4 ML methods in SparkR: {code:none} glm(formula, family, data, ...) kmeans(data, centers, ...) naiveBayes(formula, data, ...) survreg(formula, data, ...) {code} We tried to keep the signatures similar to existing ones in R. However, if we put them together, they are not consistent. One example is k-means, which doesn't accept a formula. Instead of looking at each method independently, we might want to update the signature of kmeans to {code:none} kmeans(formula, data, centers, ...) {code} We can also discuss possible global changes here. For example, `glm` puts `family` before `data` while `kmeans` puts `centers` after `data`. This is not consistent. And logically, the formula doesn't mean anything without associating with a DataFrame. So it makes more sense to me to have the following signature: {code:none} algorithm(data, formula, [required params], [optional params]) {code} If we make this change, we might want to avoid name collisions because they have different signature. We can use `ml.kmeans`, 'ml.glm`, etc. Sorry for discussing API changes in the last minute. But I think it would be better to have consistent signatures in SparkR. cc: [~shivaram] [~josephkb] [~yanboliang] was: In current master, we have 4 ML methods in SparkR: {code:none} glm(formula, family, data, ...) kmeans(data, centers, ...) naiveBayes(formula, data, ...) survreg(formula, data, ...) {code} We tried to keep the signatures similar to existing ones in R. However, if we put them together, they are not consistent. One example is k-means, which doesn't accept a formula. Instead of looking at each method independently, we might want to update the signature of kmeans to {code:none} kmeans(formula, data, centers, ...) {code} We can also discuss possible global changes here. For example, `glm` puts `family` before `data` while `kmeans` puts `centers` after `data`. This is not consistent. And logically, the formula doesn't mean anything without associating with a DataFrame. So it makes more sense to me to have the following signature: {code:none} algorithm(data, formula, [required params], [optional params]) {code} If we make this change, we might want to avoid name collisions because they have different signature. We can use `ml.kmeans`, 'ml.glm`, etc. Sorry for discussing API changes in the last minute. But I think it would be better to have consistent signatures in SparkR. > Make ML APIs in SparkR consistent > - > > Key: SPARK-14831 > URL: https://issues.apache.org/jira/browse/SPARK-14831 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Critical > > In current master, we have 4 ML methods in SparkR: > {code:none} > glm(formula, family, data, ...) > kmeans(data, centers, ...) > naiveBayes(formula, data, ...) > survreg(formula, data, ...) > {code} > We tried to keep the signatures similar to existing ones in R. However, if we > put them together, they are not consistent. One example is k-means, which > doesn't accept a formula. Instead of looking at each method independently, we > might want to update the signature of kmeans to > {code:none} > kmeans(formula, data, centers, ...) > {code} > We can also discuss possible global changes here. For example, `glm` puts > `family` before `data` while `kmeans` puts `centers` after `data`. This is > not consistent. And logically, the formula doesn't mean anything without > associating with a DataFrame. So it makes more sense to me to have the > following signature: > {code:none} > algorithm(data, formula, [required params], [optional params]) > {code} > If we make this change, we might want to avoid name collisions because they > have different signature. We can use `ml.kmeans`, 'ml.glm`, etc. > Sorry for discussing API changes in the last minute. But I think it would be > better to have consistent signatures in SparkR. > cc: [~shivaram] [~josephkb] [~yanboliang] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14716) Add partitioned parquet support file stream sink
[ https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das reassigned SPARK-14716: - Assignee: Tathagata Das > Add partitioned parquet support file stream sink > > > Key: SPARK-14716 > URL: https://issues.apache.org/jira/browse/SPARK-14716 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14555) Python API for methods introduced for Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-14555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das reassigned SPARK-14555: - Assignee: Burak Yavuz > Python API for methods introduced for Structured Streaming > -- > > Key: SPARK-14555 > URL: https://issues.apache.org/jira/browse/SPARK-14555 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL, Streaming >Reporter: Burak Yavuz >Assignee: Burak Yavuz > Fix For: 2.0.0 > > > Methods added for Structured Streaming don't have a Python API yet. > We need to provide APIs for the new methods in: > - DataFrameReader > - DataFrameWriter > - ContinuousQuery > - Trigger -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14831) Make ML APIs in SparkR consistent
Xiangrui Meng created SPARK-14831: - Summary: Make ML APIs in SparkR consistent Key: SPARK-14831 URL: https://issues.apache.org/jira/browse/SPARK-14831 Project: Spark Issue Type: Improvement Components: ML, SparkR Affects Versions: 2.0.0 Reporter: Xiangrui Meng Assignee: Xiangrui Meng Priority: Critical In current master, we have 4 ML methods in SparkR: {code:none} glm(formula, family, data, ...) kmeans(data, centers, ...) naiveBayes(formula, data, ...) survreg(formula, data, ...) {code} We tried to keep the signatures similar to existing ones in R. However, if we put them together, they are not consistent. One example is k-means, which doesn't accept a formula. Instead of looking at each method independently, we might want to update the signature of kmeans to {code:none} kmeans(formula, data, centers, ...) {code} We can also discuss possible global changes here. For example, `glm` puts `family` before `data` while `kmeans` puts `centers` after `data`. This is not consistent. And logically, the formula doesn't mean anything without associating with a DataFrame. So it makes more sense to me to have the following signature: {code:none} algorithm(data, formula, [required params], [optional params]) {code} If we make this change, we might want to avoid name collisions because they have different signature. We can use `ml.kmeans`, 'ml.glm`, etc. Sorry for discussing API changes in the last minute. But I think it would be better to have consistent signatures in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12488) LDA describeTopics() Generates Invalid Term IDs
[ https://issues.apache.org/jira/browse/SPARK-12488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253095#comment-15253095 ] Joseph K. Bradley commented on SPARK-12488: --- Ping [~ilganeli] I hope this is fixed now! > LDA describeTopics() Generates Invalid Term IDs > --- > > Key: SPARK-12488 > URL: https://issues.apache.org/jira/browse/SPARK-12488 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.5.2 >Reporter: Ilya Ganelin > > When running the LDA model, and using the describeTopics function, invalid > values appear in the termID list that is returned: > The below example generates 10 topics on a data set with a vocabulary of 685. > {code} > // Set LDA parameters > val numTopics = 10 > val lda = new LDA().setK(numTopics).setMaxIterations(10) > val ldaModel = lda.run(docTermVector) > val distModel = > ldaModel.asInstanceOf[org.apache.spark.mllib.clustering.DistributedLDAModel] > {code} > {code} > scala> ldaModel.describeTopics()(0)._1.sorted.reverse > res40: Array[Int] = Array(2064860663, 2054149956, 1991041659, 1986948613, > 1962816105, 1858775243, 1842920256, 1799900935, 1792510791, 1792371944, > 1737877485, 1712816533, 1690397927, 1676379181, 1664181296, 1501782385, > 1274389076, 1260230987, 1226545007, 1213472080, 1068338788, 1050509279, > 714524034, 678227417, 678227086, 624763822, 624623852, 618552479, 616917682, > 551612860, 453929488, 371443786, 183302140, 58762039, 42599819, 9947563, 617, > 616, 615, 612, 603, 597, 596, 595, 594, 593, 592, 591, 590, 589, 588, 587, > 586, 585, 584, 583, 582, 581, 580, 579, 578, 577, 576, 575, 574, 573, 572, > 571, 570, 569, 568, 567, 566, 565, 564, 563, 562, 561, 560, 559, 558, 557, > 556, 555, 554, 553, 552, 551, 550, 549, 548, 547, 546, 545, 544, 543, 542, > 541, 540, 539, 538, 537, 536, 535, 534, 533, 532, 53... > {code} > {code} > scala> ldaModel.describeTopics()(0)._1.sorted > res41: Array[Int] = Array(-2087809139, -2001127319, -1979718998, -1833443915, > -1811530305, -1765302237, -1668096260, -1527422175, -1493838005, -1452770216, > -1452508395, -1452502074, -1452277147, -1451720206, -1450928740, -1450237612, > -1448730073, -1437852514, -1420883015, -1418557080, -1397997340, -1397995485, > -1397991169, -1374921919, -1360937376, -1360533511, -1320627329, -1314475604, > -1216400643, -1210734882, -1107065297, -1063529036, -1062984222, -1042985412, > -1009109620, -951707740, -894644371, -799531743, -627436045, -586317106, > -563544698, -326546674, -174108802, -155900771, -80887355, -78916591, > -26690004, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, > 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, > 38, 39, 40, 41, 42, 43, 44, 45, 4... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14802) Disable Passing to Hive the queries that can't be parsed
[ https://issues.apache.org/jira/browse/SPARK-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253093#comment-15253093 ] Reynold Xin commented on SPARK-14802: - +1 > Disable Passing to Hive the queries that can't be parsed > > > Key: SPARK-14802 > URL: https://issues.apache.org/jira/browse/SPARK-14802 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li > > When hitting the query that can't be parsed, we pass it to Hive. Thus, we hit > some strange error messages from Hive. We should disable it after we have > integrated the SparkSqlParser & HiveSqlParser. > For example, > {code} > NoViableAltException(302@[192:1: tableName : (db= identifier DOT tab= > identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME > $tab) );]) > at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) > at org.antlr.runtime.DFA.predict(DFA.java:116) > at > org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4747) > at > org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:45920) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14777) Finally, merge HiveSqlAstBuilder and SparkSqlAstBuilder
[ https://issues.apache.org/jira/browse/SPARK-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14777. - Resolution: Fixed Assignee: Reynold Xin Fix Version/s: 2.0.0 > Finally, merge HiveSqlAstBuilder and SparkSqlAstBuilder > --- > > Key: SPARK-14777 > URL: https://issues.apache.org/jira/browse/SPARK-14777 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14821) Move analyze table parsing into SparkSqlAstBuilder
[ https://issues.apache.org/jira/browse/SPARK-14821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14821. - Resolution: Fixed Fix Version/s: 2.0.0 > Move analyze table parsing into SparkSqlAstBuilder > -- > > Key: SPARK-14821 > URL: https://issues.apache.org/jira/browse/SPARK-14821 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14479) GLM supports output link prediction
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-14479. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12287 [https://github.com/apache/spark/pull/12287] > GLM supports output link prediction > --- > > Key: SPARK-14479 > URL: https://issues.apache.org/jira/browse/SPARK-14479 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Reporter: Yanbo Liang > Fix For: 2.0.0 > > > In R glm and glmnet, the default type of predict is "link" which is the > linear predictor, users can specify "type = response" to output response > prediction. Currently the ML glm predict will output "response" prediction by > default, I think it's more reasonable. Should we change the default type of > ML glm predict output? > R glm: > https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.glm.html > R glmnet: http://www.inside-r.org/packages/cran/glmnet/docs/predict.glmnet > Meanwhile, we should decide the default type of glm predict output in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14479) GLM supports output link prediction
[ https://issues.apache.org/jira/browse/SPARK-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-14479: -- Assignee: Yanbo Liang > GLM supports output link prediction > --- > > Key: SPARK-14479 > URL: https://issues.apache.org/jira/browse/SPARK-14479 > Project: Spark > Issue Type: Improvement > Components: ML, SparkR >Reporter: Yanbo Liang >Assignee: Yanbo Liang > Fix For: 2.0.0 > > > In R glm and glmnet, the default type of predict is "link" which is the > linear predictor, users can specify "type = response" to output response > prediction. Currently the ML glm predict will output "response" prediction by > default, I think it's more reasonable. Should we change the default type of > ML glm predict output? > R glm: > https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.glm.html > R glmnet: http://www.inside-r.org/packages/cran/glmnet/docs/predict.glmnet > Meanwhile, we should decide the default type of glm predict output in SparkR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14830) Add RemoveRepetitionFromGroupExpressions optimizer
[ https://issues.apache.org/jira/browse/SPARK-14830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14830: Assignee: Apache Spark > Add RemoveRepetitionFromGroupExpressions optimizer > -- > > Key: SPARK-14830 > URL: https://issues.apache.org/jira/browse/SPARK-14830 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Reporter: Dongjoon Hyun >Assignee: Apache Spark > > This issue aims to optimize GroupExpressions by removing repeating > expressions. > **Before** > {code} > scala> sql("select a from (select explode(array(1,2)) a) T group by a, a, > a").explain() > == Physical Plan == > WholeStageCodegen > : +- TungstenAggregate(key=[a#5,a#5,a#5], functions=[], output=[a#5]) > : +- INPUT > +- Exchange hashpartitioning(a#5, a#5, a#5, 200), None >+- WholeStageCodegen > : +- TungstenAggregate(key=[a#5,a#5,a#5], functions=[], > output=[a#5,a#5,a#5]) > : +- INPUT > +- Generate explode([1,2]), false, false, [a#5] > +- Scan OneRowRelation[] > {code} > **After** > {code} > scala> sql("select a from (select explode(array(1,2)) a) T group by a, a, > a").explain() > == Physical Plan == > WholeStageCodegen > : +- TungstenAggregate(key=[a#5], functions=[], output=[a#5]) > : +- INPUT > +- Exchange hashpartitioning(a#5, 200), None >+- WholeStageCodegen > : +- TungstenAggregate(key=[a#5], functions=[], output=[a#5]) > : +- INPUT > +- Generate explode([1,2]), false, false, [a#5] > +- Scan OneRowRelation[] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14830) Add RemoveRepetitionFromGroupExpressions optimizer
[ https://issues.apache.org/jira/browse/SPARK-14830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253047#comment-15253047 ] Apache Spark commented on SPARK-14830: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/12590 > Add RemoveRepetitionFromGroupExpressions optimizer > -- > > Key: SPARK-14830 > URL: https://issues.apache.org/jira/browse/SPARK-14830 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Reporter: Dongjoon Hyun > > This issue aims to optimize GroupExpressions by removing repeating > expressions. > **Before** > {code} > scala> sql("select a from (select explode(array(1,2)) a) T group by a, a, > a").explain() > == Physical Plan == > WholeStageCodegen > : +- TungstenAggregate(key=[a#5,a#5,a#5], functions=[], output=[a#5]) > : +- INPUT > +- Exchange hashpartitioning(a#5, a#5, a#5, 200), None >+- WholeStageCodegen > : +- TungstenAggregate(key=[a#5,a#5,a#5], functions=[], > output=[a#5,a#5,a#5]) > : +- INPUT > +- Generate explode([1,2]), false, false, [a#5] > +- Scan OneRowRelation[] > {code} > **After** > {code} > scala> sql("select a from (select explode(array(1,2)) a) T group by a, a, > a").explain() > == Physical Plan == > WholeStageCodegen > : +- TungstenAggregate(key=[a#5], functions=[], output=[a#5]) > : +- INPUT > +- Exchange hashpartitioning(a#5, 200), None >+- WholeStageCodegen > : +- TungstenAggregate(key=[a#5], functions=[], output=[a#5]) > : +- INPUT > +- Generate explode([1,2]), false, false, [a#5] > +- Scan OneRowRelation[] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14830) Add RemoveRepetitionFromGroupExpressions optimizer
[ https://issues.apache.org/jira/browse/SPARK-14830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14830: Assignee: (was: Apache Spark) > Add RemoveRepetitionFromGroupExpressions optimizer > -- > > Key: SPARK-14830 > URL: https://issues.apache.org/jira/browse/SPARK-14830 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Reporter: Dongjoon Hyun > > This issue aims to optimize GroupExpressions by removing repeating > expressions. > **Before** > {code} > scala> sql("select a from (select explode(array(1,2)) a) T group by a, a, > a").explain() > == Physical Plan == > WholeStageCodegen > : +- TungstenAggregate(key=[a#5,a#5,a#5], functions=[], output=[a#5]) > : +- INPUT > +- Exchange hashpartitioning(a#5, a#5, a#5, 200), None >+- WholeStageCodegen > : +- TungstenAggregate(key=[a#5,a#5,a#5], functions=[], > output=[a#5,a#5,a#5]) > : +- INPUT > +- Generate explode([1,2]), false, false, [a#5] > +- Scan OneRowRelation[] > {code} > **After** > {code} > scala> sql("select a from (select explode(array(1,2)) a) T group by a, a, > a").explain() > == Physical Plan == > WholeStageCodegen > : +- TungstenAggregate(key=[a#5], functions=[], output=[a#5]) > : +- INPUT > +- Exchange hashpartitioning(a#5, 200), None >+- WholeStageCodegen > : +- TungstenAggregate(key=[a#5], functions=[], output=[a#5]) > : +- INPUT > +- Generate explode([1,2]), false, false, [a#5] > +- Scan OneRowRelation[] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14828) Start SparkSession in REPL instead of SQLContext
[ https://issues.apache.org/jira/browse/SPARK-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14828: Assignee: Apache Spark (was: Andrew Or) > Start SparkSession in REPL instead of SQLContext > > > Key: SPARK-14828 > URL: https://issues.apache.org/jira/browse/SPARK-14828 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14828) Start SparkSession in REPL instead of SQLContext
[ https://issues.apache.org/jira/browse/SPARK-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14828: Assignee: Andrew Or (was: Apache Spark) > Start SparkSession in REPL instead of SQLContext > > > Key: SPARK-14828 > URL: https://issues.apache.org/jira/browse/SPARK-14828 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org