[jira] [Commented] (SPARK-19557) Output parameters are not present in SQL Query Plan
[ https://issues.apache.org/jira/browse/SPARK-19557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862528#comment-15862528 ] Salil Surendran commented on SPARK-19557: - [~hyukjin.kwon] We don't have this information in the query plan as per the discussion in this PR https://github.com/apache/spark/pull/16664 . > Output parameters are not present in SQL Query Plan > --- > > Key: SPARK-19557 > URL: https://issues.apache.org/jira/browse/SPARK-19557 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Salil Surendran > > For DataFrameWriter methods like parquet(), json(), csv() etc. output > parameters are not present in the QueryExecution object. For methods like > saveAsTable() they do. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19558) Provide a config option to attach QueryExecutionListener to SparkSession
Salil Surendran created SPARK-19558: --- Summary: Provide a config option to attach QueryExecutionListener to SparkSession Key: SPARK-19558 URL: https://issues.apache.org/jira/browse/SPARK-19558 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: Salil Surendran Provide a configuration property(just like spark.extraListeners) to attach a QueryExecutionListener to a SparkSession -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19557) Output parameters are not present in SQL Query Plan
Salil Surendran created SPARK-19557: --- Summary: Output parameters are not present in SQL Query Plan Key: SPARK-19557 URL: https://issues.apache.org/jira/browse/SPARK-19557 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: Salil Surendran For DataFrameWriter methods like parquet(), json(), csv() etc. output parameters are not present in the QueryExecution object. For methods like saveAsTable() they do. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18120) QueryExecutionListener method doesnt' get executed for DataFrameWriter methods
[ https://issues.apache.org/jira/browse/SPARK-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831719#comment-15831719 ] Salil Surendran commented on SPARK-18120: - Sorry for the delay but ran into some unexpected unit test failures. Rerunning the whole test suite to make sure nothing is broken. > QueryExecutionListener method doesnt' get executed for DataFrameWriter methods > -- > > Key: SPARK-18120 > URL: https://issues.apache.org/jira/browse/SPARK-18120 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Salil Surendran > > QueryExecutionListener is a class that has methods named onSuccess() and > onFailure() that gets called when a query is executed. Each of those methods > takes a QueryExecution object as a parameter which can be used for metrics > analysis. It gets called for several of the DataSet methods like take, head, > first, collect etc. but doesn't get called for any of the DataFrameWriter > methods like saveAsTable, save etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18120) QueryExecutionListener method doesnt' get executed for DataFrameWriter methods
[ https://issues.apache.org/jira/browse/SPARK-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830630#comment-15830630 ] Salil Surendran commented on SPARK-18120: - [~thomastechs]I will be making a PR today. > QueryExecutionListener method doesnt' get executed for DataFrameWriter methods > -- > > Key: SPARK-18120 > URL: https://issues.apache.org/jira/browse/SPARK-18120 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Salil Surendran > > QueryExecutionListener is a class that has methods named onSuccess() and > onFailure() that gets called when a query is executed. Each of those methods > takes a QueryExecution object as a parameter which can be used for metrics > analysis. It gets called for several of the DataSet methods like take, head, > first, collect etc. but doesn't get called for any of the DataFrameWriter > methods like saveAsTable, save etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18889) Spark incorrectly reads default columns from a Hive view
Salil Surendran created SPARK-18889: --- Summary: Spark incorrectly reads default columns from a Hive view Key: SPARK-18889 URL: https://issues.apache.org/jira/browse/SPARK-18889 Project: Spark Issue Type: Bug Reporter: Salil Surendran Spark fails to read a view that have columns that are given default names; To reproduce follow the following steps in Hive: * CREATE TABLE IF NOT EXISTS employee_details ( eid int, name String, salary String, destination String, json String) COMMENT 'Employee details' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE; * insert into employee_details values(100, "Salil", "100k", "Mumbai", s"""{"Foo":"ABC","Bar":"2009010110","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}""" ) * create view employee_25 as select eid, name, `_c4` from (select eid, name, destination,v1.foo, cast(v1.bar as timestamp) from employee_details LATERAL VIEW json_tuple(json,'Foo','Bar')v1 as foo, bar)v2; * select * from employee_25; You will see an output like this: +--+---+--+--+ | employee_25.eid | employee_25.name | employee_25._c4 | +--+---+--+--+ | 100 | Salil | NULL | +--+---+--+--+ Now go to spark-shell and try to query the view: scala> spark.sql("select * from employee_25").show org.apache.spark.sql.AnalysisException: cannot resolve '`v2._c4`' given input columns: [foo, name, eid, bar, destination]; line 1 pos 32; 'Project [*] +- 'SubqueryAlias employee_25 +- 'Project [eid#56, name#57, 'v2._c4] +- SubqueryAlias v2 +- Project [eid#56, name#57, destination#59, foo#61, cast(bar#62 as timestamp) AS bar#63] +- Generate json_tuple(json#60, Foo, Bar), true, false, v1, [foo#61, bar#62] +- MetastoreRelation default, employee_details at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:308) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:308) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:307) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:269) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:279) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:283) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:283) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$8.apply(QueryPlan.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:186) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:288) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125) at scala.collection.immutable.List.foreach(List.scala:381) at
[jira] [Created] (SPARK-18833) Changing partition location using the 'ALTER TABLE .. SET LOCATION' command via beeline doesn't get reflected in Spark
Salil Surendran created SPARK-18833: --- Summary: Changing partition location using the 'ALTER TABLE .. SET LOCATION' command via beeline doesn't get reflected in Spark Key: SPARK-18833 URL: https://issues.apache.org/jira/browse/SPARK-18833 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.2 Reporter: Salil Surendran Use the 'ALTER TABLE' command to change the partition location of a table via beeline. spark-shell doesn't find any of the data from the table even though the data can be read via beeline. To reproduce do the following: == At hive side: === hive> CREATE EXTERNAL TABLE testA (id STRING, name STRING) PARTITIONED BY (idP STRING) STORED AS PARQUET LOCATION '/user/root/A/' ; hive> CREATE EXTERNAL TABLE testB (id STRING, name STRING) PARTITIONED BY (idP STRING) STORED AS PARQUET LOCATION '/user/root/B/' ; hive> CREATE EXTERNAL TABLE testC (id STRING, name STRING) PARTITIONED BY (idP STRING) STORED AS PARQUET LOCATION '/user/root/C/' ; hive> insert into table testA PARTITION (idP='1') values ('1',"test"),('2',"test2"); hive> ALTER TABLE testB ADD IF NOT EXISTS PARTITION(idP=‘1’); hive> ALTER TABLE testB PARTITION (idP='1') SET LOCATION '/user/root/A/idp=1/'; hive> select * from testA; OK 1 test 1 2 test2 1 hive> select * from testB; OK 1 test 1 2 test2 1 Conclusion: it worked changing the location to the place where the parquet file is present. === At spark side: === scala> import org.apache.spark.sql.hive.HiveContext scala> val hiveContext = new HiveContext(sc) scala> hiveContext.refreshTable("testB") scala> hiveContext.sql("select * from testB").count res2: Long = 0 scala> hiveContext.sql("ALTER TABLE testC ADD IF NOT EXISTS PARTITION(idP='1')") res3: org.apache.spark.sql.DataFrame = [result: string] scala> hiveContext.sql("ALTER TABLE testC PARTITION (idP='1') SET LOCATION '/user/root/A/idp=1/' ") res4: org.apache.spark.sql.DataFrame = [result: string] scala> hiveContext.sql("select * from testC").count res6: Long = 0 scala> hiveContext.refreshTable("testC") scala> hiveContext.sql("select * from testC").count res8: Long = 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18120) QueryExecutionListener method doesnt' get executed for DataFrameWriter methods
[ https://issues.apache.org/jira/browse/SPARK-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705756#comment-15705756 ] Salil Surendran commented on SPARK-18120: - [~r...@databricks.com]. No it doesn't get triggered. [~jayadevan.m] I already have the code written and ready. Will be making a PR soon. > QueryExecutionListener method doesnt' get executed for DataFrameWriter methods > -- > > Key: SPARK-18120 > URL: https://issues.apache.org/jira/browse/SPARK-18120 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Salil Surendran > > QueryExecutionListener is a class that has methods named onSuccess() and > onFailure() that gets called when a query is executed. Each of those methods > takes a QueryExecution object as a parameter which can be used for metrics > analysis. It gets called for several of the DataSet methods like take, head, > first, collect etc. but doesn't get called for any of hte DataFrameWriter > methods like saveAsTable, save etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18120) QueryExecutionListener method doesnt' get executed for DataFrameWriter methods
Salil Surendran created SPARK-18120: --- Summary: QueryExecutionListener method doesnt' get executed for DataFrameWriter methods Key: SPARK-18120 URL: https://issues.apache.org/jira/browse/SPARK-18120 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.1 Reporter: Salil Surendran QueryExecutionListener is a class that has methods named onSuccess() and onFailure() that gets called when a query is executed. Each of those methods takes a QueryExecution object as a parameter which can be used for metrics analysis. It gets called for several of the DataSet methods like take, head, first, collect etc. but doesn't get called for any of hte DataFrameWriter methods like saveAsTable, save etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org