[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326247#comment-14326247 ] Yin Huai commented on SPARK-2973: - I just tried our master. sql(show tables).collect() will not start a job. However, sql(show tables).take(1) will start a job because our overridden executeTake in ExecutedCommand will not be called in this case. The reason is that DataFrame.take(1) calls DataFrame.head(1) and then head calls limit(1).collect(). Inside limit, we create a DataFrame with Limit(Literal(1), ExecutedCommand(ShowTablesCommand)) as the logicalPlan. When we create the DataFrame for Limit, because ExecutedCommand is a command, we will create a LogicalRDD (see [here|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala#L77]) and call queryExecution.toRDD of this ExecutedCommand. The queryExecution of sql(show tables).limit(1) will be {code} == Parsed Logical Plan == Limit 1 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at parallelize at commands.scala:65 == Analyzed Logical Plan == Limit 1 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at parallelize at commands.scala:65 == Optimized Logical Plan == Limit 1 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at parallelize at commands.scala:65 == Physical Plan == Limit 1 PhysicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at parallelize at commands.scala:65 {code} So, Limit.executeCollect will call PhysicalRDD.executeTake and then trigger a job execution. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256140#comment-14256140 ] Michael Armbrust commented on SPARK-2973: - I think the confusion there would be if someone then run .map(...) on that RDD. It would be pretty confusing if it did not run a Spark job. What is wrong with the approach we are already using for executeCollect(). We can add a executeTake with a default implementation and override that in ExecutedCommand. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255152#comment-14255152 ] Cheng Lian commented on SPARK-2973: --- How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't trigger any distributed jobs? Essentially it's just a normal {{Seq}} but behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or {{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255150#comment-14255150 ] Cheng Lian commented on SPARK-2973: --- How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't trigger any distributed jobs? Essentially it's just a normal {{Seq}} but behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or {{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255151#comment-14255151 ] Cheng Lian commented on SPARK-2973: --- How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't trigger any distributed jobs? Essentially it's just a normal {{Seq}} but behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or {{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255149#comment-14255149 ] Cheng Lian commented on SPARK-2973: --- How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't trigger any distributed jobs? Essentially it's just a normal {{Seq}} but behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or {{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255153#comment-14255153 ] Cheng Lian commented on SPARK-2973: --- How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't trigger any distributed jobs? Essentially it's just a normal {{Seq}} but behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or {{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255154#comment-14255154 ] Cheng Lian commented on SPARK-2973: --- (Sorry for spamming the comments, experienced some network issue here...) Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254006#comment-14254006 ] Michael Armbrust commented on SPARK-2973: - I think the solution here is to also special case take in a SparkPlan and use that from schema rdd. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Assignee: Cheng Lian Priority: Critical Fix For: 1.2.0 Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job
[ https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095241#comment-14095241 ] Michael Armbrust commented on SPARK-2973: - We can just override executeCollect() in Commands. Add a way to show tables without executing a job Key: SPARK-2973 URL: https://issues.apache.org/jira/browse/SPARK-2973 Project: Spark Issue Type: Improvement Components: SQL Reporter: Aaron Davidson Right now, sql(show tables).collect() will start a Spark job which shows up in the UI. There should be a way to get these without this step. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org