[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2015-02-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326247#comment-14326247
 ] 

Yin Huai commented on SPARK-2973:
-

I just tried our master. sql(show tables).collect() will not start a job. 
However, sql(show tables).take(1) will start a job because our overridden 
executeTake in ExecutedCommand will not be called in this case. 

The reason is that DataFrame.take(1) calls DataFrame.head(1) and then head 
calls limit(1).collect(). Inside limit, we create a DataFrame with 
Limit(Literal(1), ExecutedCommand(ShowTablesCommand)) as the logicalPlan. When 
we create the DataFrame for Limit, because ExecutedCommand is a command, we 
will create a LogicalRDD (see 
[here|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameImpl.scala#L77])
 and call queryExecution.toRDD of this ExecutedCommand. The queryExecution of 
sql(show tables).limit(1) will be
{code}
== Parsed Logical Plan ==
Limit 1
 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at 
parallelize at commands.scala:65

== Analyzed Logical Plan ==
Limit 1
 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at 
parallelize at commands.scala:65

== Optimized Logical Plan ==
Limit 1
 LogicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at 
parallelize at commands.scala:65

== Physical Plan ==
Limit 1
 PhysicalRDD [tableName#10,isTemporary#11], ParallelCollectionRDD[7] at 
parallelize at commands.scala:65
{code}

So, Limit.executeCollect will call PhysicalRDD.executeTake and then trigger a 
job execution.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-12-22 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256140#comment-14256140
 ] 

Michael Armbrust commented on SPARK-2973:
-

I think the confusion there would be if someone then run .map(...) on that RDD. 
 It would be pretty confusing if it did not run a Spark job.  What is wrong 
with the approach we are already using for executeCollect().  We can add a 
executeTake with a default implementation and override that in ExecutedCommand.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-12-21 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255152#comment-14255152
 ] 

Cheng Lian commented on SPARK-2973:
---

How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't 
trigger any distributed jobs? Essentially it's just a normal {{Seq}} but 
behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or 
{{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-12-21 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255150#comment-14255150
 ] 

Cheng Lian commented on SPARK-2973:
---

How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't 
trigger any distributed jobs? Essentially it's just a normal {{Seq}} but 
behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or 
{{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-12-21 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255151#comment-14255151
 ] 

Cheng Lian commented on SPARK-2973:
---

How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't 
trigger any distributed jobs? Essentially it's just a normal {{Seq}} but 
behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or 
{{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-12-21 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255149#comment-14255149
 ] 

Cheng Lian commented on SPARK-2973:
---

How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't 
trigger any distributed jobs? Essentially it's just a normal {{Seq}} but 
behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or 
{{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-12-21 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255153#comment-14255153
 ] 

Cheng Lian commented on SPARK-2973:
---

How about adding a special {{LocalRDD}} that inherits {{RDD}} but doesn't 
trigger any distributed jobs? Essentially it's just a normal {{Seq}} but 
behaves like an {{RDD}}. Then we can let {{HiveNativeCommand}} (or 
{{NativeCommand}} in branch-1.2) return a {{LocalRDD}} in {{.execute()}}.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-12-21 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255154#comment-14255154
 ] 

Cheng Lian commented on SPARK-2973:
---

(Sorry for spamming the comments, experienced some network issue here...)

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-12-19 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14254006#comment-14254006
 ] 

Michael Armbrust commented on SPARK-2973:
-

I think the solution here is to also special case take in a SparkPlan and use 
that from schema rdd.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson
Assignee: Cheng Lian
Priority: Critical
 Fix For: 1.2.0


 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job

2014-08-13 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095241#comment-14095241
 ] 

Michael Armbrust commented on SPARK-2973:
-

We can just override executeCollect() in Commands.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson

 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org