[jira] [Commented] (SPARK-19557) Output parameters are not present in SQL Query Plan

2017-02-11 Thread Salil Surendran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862528#comment-15862528
 ] 

Salil Surendran commented on SPARK-19557:
-

[~hyukjin.kwon] We don't have this information in the query plan as per the 
discussion in this PR https://github.com/apache/spark/pull/16664 . 

> Output parameters are not present in SQL Query Plan
> ---
>
> Key: SPARK-19557
> URL: https://issues.apache.org/jira/browse/SPARK-19557
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Salil Surendran
>
> For DataFrameWriter methods like parquet(), json(), csv() etc. output 
> parameters are not present in the QueryExecution object. For methods like 
> saveAsTable() they do. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19558) Provide a config option to attach QueryExecutionListener to SparkSession

2017-02-10 Thread Salil Surendran (JIRA)
Salil Surendran created SPARK-19558:
---

 Summary: Provide a config option to attach QueryExecutionListener 
to SparkSession
 Key: SPARK-19558
 URL: https://issues.apache.org/jira/browse/SPARK-19558
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Salil Surendran


Provide a configuration property(just like spark.extraListeners) to attach a 
QueryExecutionListener to a SparkSession



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19557) Output parameters are not present in SQL Query Plan

2017-02-10 Thread Salil Surendran (JIRA)
Salil Surendran created SPARK-19557:
---

 Summary: Output parameters are not present in SQL Query Plan
 Key: SPARK-19557
 URL: https://issues.apache.org/jira/browse/SPARK-19557
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Salil Surendran


For DataFrameWriter methods like parquet(), json(), csv() etc. output 
parameters are not present in the QueryExecution object. For methods like 
saveAsTable() they do. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18120) QueryExecutionListener method doesnt' get executed for DataFrameWriter methods

2017-01-20 Thread Salil Surendran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831719#comment-15831719
 ] 

Salil Surendran commented on SPARK-18120:
-

Sorry for the delay but ran into some unexpected unit test failures. Rerunning 
the whole test suite to make sure nothing is broken.

> QueryExecutionListener method doesnt' get executed for DataFrameWriter methods
> --
>
> Key: SPARK-18120
> URL: https://issues.apache.org/jira/browse/SPARK-18120
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Salil Surendran
>
> QueryExecutionListener is a class that has methods named onSuccess() and 
> onFailure() that gets called when a query is executed. Each of those methods 
> takes a QueryExecution object as a parameter which can be used for metrics 
> analysis. It gets called for several of the DataSet methods like take, head, 
> first, collect etc. but doesn't get called for any of the DataFrameWriter 
> methods like saveAsTable, save etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18120) QueryExecutionListener method doesnt' get executed for DataFrameWriter methods

2017-01-19 Thread Salil Surendran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830630#comment-15830630
 ] 

Salil Surendran commented on SPARK-18120:
-

[~thomastechs]I will be making a PR today.

> QueryExecutionListener method doesnt' get executed for DataFrameWriter methods
> --
>
> Key: SPARK-18120
> URL: https://issues.apache.org/jira/browse/SPARK-18120
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Salil Surendran
>
> QueryExecutionListener is a class that has methods named onSuccess() and 
> onFailure() that gets called when a query is executed. Each of those methods 
> takes a QueryExecution object as a parameter which can be used for metrics 
> analysis. It gets called for several of the DataSet methods like take, head, 
> first, collect etc. but doesn't get called for any of the DataFrameWriter 
> methods like saveAsTable, save etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18889) Spark incorrectly reads default columns from a Hive view

2016-12-15 Thread Salil Surendran (JIRA)
Salil Surendran created SPARK-18889:
---

 Summary: Spark incorrectly reads default columns from a Hive view
 Key: SPARK-18889
 URL: https://issues.apache.org/jira/browse/SPARK-18889
 Project: Spark
  Issue Type: Bug
Reporter: Salil Surendran


Spark fails to read a view that have columns that are given default names;
To reproduce follow the following steps in Hive:

   * CREATE TABLE IF NOT EXISTS employee_details ( eid int, name String,
salary String, destination String, json String)
COMMENT 'Employee details'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
* insert into employee_details values(100, "Salil", "100k", "Mumbai", 
s"""{"Foo":"ABC","Bar":"2009010110","Quux":{"QuuxId":1234,"QuuxName":"Sam"}}"""
 )
   * create view employee_25 as select eid, name, `_c4` from (select eid, name, 
destination,v1.foo, cast(v1.bar as timestamp) from employee_details LATERAL 
VIEW json_tuple(json,'Foo','Bar')v1 as foo, bar)v2;
* select * from employee_25;

You will see an output like this:
+--+---+--+--+
| employee_25.eid  | employee_25.name  | employee_25._c4  |
+--+---+--+--+
| 100   | Salil   | NULL
 |
+--+---+--+--+

Now go to spark-shell and try to query the view:
scala> spark.sql("select * from employee_25").show
org.apache.spark.sql.AnalysisException: cannot resolve '`v2._c4`' given input 
columns: [foo, name, eid, bar, destination]; line 1 pos 32;
'Project [*]
+- 'SubqueryAlias employee_25
   +- 'Project [eid#56, name#57, 'v2._c4]
  +- SubqueryAlias v2
 +- Project [eid#56, name#57, destination#59, foo#61, cast(bar#62 as 
timestamp) AS bar#63]
+- Generate json_tuple(json#60, Foo, Bar), true, false, v1, 
[foo#61, bar#62]
   +- MetastoreRelation default, employee_details

  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:308)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:308)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:307)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:269)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:279)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:283)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:283)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$8.apply(QueryPlan.scala:288)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:186)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:288)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at 

[jira] [Created] (SPARK-18833) Changing partition location using the 'ALTER TABLE .. SET LOCATION' command via beeline doesn't get reflected in Spark

2016-12-12 Thread Salil Surendran (JIRA)
Salil Surendran created SPARK-18833:
---

 Summary: Changing partition location using the 'ALTER TABLE .. SET 
LOCATION' command via beeline doesn't get reflected in Spark
 Key: SPARK-18833
 URL: https://issues.apache.org/jira/browse/SPARK-18833
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.2
Reporter: Salil Surendran


Use the 'ALTER TABLE' command to change the partition location of a table via 
beeline. spark-shell doesn't find any of the data from the table even though 
the data can be read via beeline. To reproduce do the following:

== At hive side: ===
hive> CREATE EXTERNAL TABLE testA (id STRING, name STRING) PARTITIONED BY (idP 
STRING) STORED AS PARQUET LOCATION '/user/root/A/' ;
hive> CREATE EXTERNAL TABLE testB (id STRING, name STRING) PARTITIONED BY (idP 
STRING) STORED AS PARQUET LOCATION '/user/root/B/' ;
hive> CREATE EXTERNAL TABLE testC (id STRING, name STRING) PARTITIONED BY (idP 
STRING) STORED AS PARQUET LOCATION '/user/root/C/' ;

hive> insert into table testA PARTITION (idP='1') values 
('1',"test"),('2',"test2");

hive> ALTER TABLE testB ADD IF NOT EXISTS PARTITION(idP=‘1’);
hive> ALTER TABLE testB PARTITION (idP='1') SET LOCATION '/user/root/A/idp=1/';

hive> select * from testA;
OK
1 test 1
2 test2 1


hive> select * from testB;
OK
1 test 1
2 test2 1

Conclusion: it worked changing the location to the place where the parquet file 
is present.


=== At spark side: ===
scala> import org.apache.spark.sql.hive.HiveContext
scala> val hiveContext = new HiveContext(sc)

scala> hiveContext.refreshTable("testB")

scala> hiveContext.sql("select * from testB").count
res2: Long = 0

scala> hiveContext.sql("ALTER TABLE testC ADD IF NOT EXISTS PARTITION(idP='1')")
res3: org.apache.spark.sql.DataFrame = [result: string]

scala> hiveContext.sql("ALTER TABLE testC PARTITION (idP='1') SET LOCATION 
'/user/root/A/idp=1/' ")
res4: org.apache.spark.sql.DataFrame = [result: string]

scala> hiveContext.sql("select * from testC").count
res6: Long = 0

scala> hiveContext.refreshTable("testC")

scala> hiveContext.sql("select * from testC").count
res8: Long = 0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18120) QueryExecutionListener method doesnt' get executed for DataFrameWriter methods

2016-11-29 Thread Salil Surendran (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15705756#comment-15705756
 ] 

Salil Surendran commented on SPARK-18120:
-

[~r...@databricks.com]. No it doesn't get triggered. [~jayadevan.m] I already 
have the code written and ready. Will be making a PR soon.

> QueryExecutionListener method doesnt' get executed for DataFrameWriter methods
> --
>
> Key: SPARK-18120
> URL: https://issues.apache.org/jira/browse/SPARK-18120
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Salil Surendran
>
> QueryExecutionListener is a class that has methods named onSuccess() and 
> onFailure() that gets called when a query is executed. Each of those methods 
> takes a QueryExecution object as a parameter which can be used for metrics 
> analysis. It gets called for several of the DataSet methods like take, head, 
> first, collect etc. but doesn't get called for any of hte DataFrameWriter 
> methods like saveAsTable, save etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18120) QueryExecutionListener method doesnt' get executed for DataFrameWriter methods

2016-10-26 Thread Salil Surendran (JIRA)
Salil Surendran created SPARK-18120:
---

 Summary: QueryExecutionListener method doesnt' get executed for 
DataFrameWriter methods
 Key: SPARK-18120
 URL: https://issues.apache.org/jira/browse/SPARK-18120
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.1
Reporter: Salil Surendran


QueryExecutionListener is a class that has methods named onSuccess() and 
onFailure() that gets called when a query is executed. Each of those methods 
takes a QueryExecution object as a parameter which can be used for metrics 
analysis. It gets called for several of the DataSet methods like take, head, 
first, collect etc. but doesn't get called for any of hte DataFrameWriter 
methods like saveAsTable, save etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org