[jira] [Created] (SPARK-28897) Invalid usage of '*' in expression 'coalesce' error when executing dataframe.na.fill(0)

2019-08-28 Thread Saurabh Santhosh (Jira)
Saurabh Santhosh created SPARK-28897:


 Summary: Invalid usage of '*' in expression 'coalesce' error when 
executing dataframe.na.fill(0)
 Key: SPARK-28897
 URL: https://issues.apache.org/jira/browse/SPARK-28897
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: Saurabh Santhosh


Getting the following error when trying to execute the given statements

 
{code:java}
var df = spark.sql(s"select * from default.test_table")
df.na.fill(0)
{code}
This error happens when the following property is set
{code:java}
spark.sql("set spark.sql.parser.quotedRegexColumnNames=true")
{code}
Error :
{code:java}
org.apache.spark.sql.AnalysisException: Invalid usage of '*' in expression 
'coalesce';   at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:42)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95) 
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$expandStarExpression$1.applyOrElse(Analyzer.scala:1021)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$expandStarExpression$1.applyOrElse(Analyzer.scala:997)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
   at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:277)   
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)   
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275) 
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.expandStarExpression(Analyzer.scala:997)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$buildExpandedProjectList$1.apply(Analyzer.scala:982)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$buildExpandedProjectList$1.apply(Analyzer.scala:977)
   at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
   at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
   at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)  
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)   at 
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)   at 
scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)   at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$buildExpandedProjectList(Analyzer.scala:977)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:905)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:900)
   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
   at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
   at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:900)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.apply(Analyzer.scala:758)
   at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
   at 

[jira] [Commented] (SPARK-23012) Support for predicate pushdown and partition pruning when left joining large Hive tables

2019-03-08 Thread Saurabh Santhosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787716#comment-16787716
 ] 

Saurabh Santhosh commented on SPARK-23012:
--

[~yumwang] [~reks95]

Hi, tested this in Spark 2.4.0 and its woking fine :)

> Support for predicate pushdown and partition pruning when left joining large 
> Hive tables
> 
>
> Key: SPARK-23012
> URL: https://issues.apache.org/jira/browse/SPARK-23012
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.2.0
>Reporter: Rick Kramer
>Priority: Major
>
> We have a hive view which left outer joins several large, partitioned orc 
> hive tables together on date. When the view is used in a hive query, hive 
> pushes date predicates down into the joins and prunes the partitions for all 
> tables. When I use this view from pyspark, the predicate is only used to 
> prune the left-most table and all partitions from the additional tables are 
> selected.
> For example, consider two partitioned hive tables a & b joined in a view:
> create table a (
>a_val string
> )
> partitioned by (ds string)
> stored as orc;
> create table b (
>b_val string
> )
> partitioned by (ds string)
> stored as orc;
> create view example_view as
> select
> a_val
> , b_val
> , ds
> from a 
> left outer join b on b.ds = a.ds
> Then in pyspark you might try to query from the view filtering on ds:
> spark.table('example_view').filter(F.col('ds') == '2018-01-01')
> If table a and b are large, this results in a plan that filters a on ds = 
> 2018-01-01, but selects scans all partitions of table b.
> If the join in the view is changed to an inner join, the predicate gets 
> pushed down to a & b and the partitions are pruned as you'd expect.
> In practice, the view is fairly complex and contains a lot of business logic 
> we'd prefer not to replicate in pyspark if we can avoid it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23012) Support for predicate pushdown and partition pruning when left joining large Hive tables

2019-03-07 Thread Saurabh Santhosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787493#comment-16787493
 ] 

Saurabh Santhosh commented on SPARK-23012:
--

[~yumwang] Thanks for the quick response. Will check and let you know :)

> Support for predicate pushdown and partition pruning when left joining large 
> Hive tables
> 
>
> Key: SPARK-23012
> URL: https://issues.apache.org/jira/browse/SPARK-23012
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.2.0
>Reporter: Rick Kramer
>Priority: Major
>
> We have a hive view which left outer joins several large, partitioned orc 
> hive tables together on date. When the view is used in a hive query, hive 
> pushes date predicates down into the joins and prunes the partitions for all 
> tables. When I use this view from pyspark, the predicate is only used to 
> prune the left-most table and all partitions from the additional tables are 
> selected.
> For example, consider two partitioned hive tables a & b joined in a view:
> create table a (
>a_val string
> )
> partitioned by (ds string)
> stored as orc;
> create table b (
>b_val string
> )
> partitioned by (ds string)
> stored as orc;
> create view example_view as
> select
> a_val
> , b_val
> , ds
> from a 
> left outer join b on b.ds = a.ds
> Then in pyspark you might try to query from the view filtering on ds:
> spark.table('example_view').filter(F.col('ds') == '2018-01-01')
> If table a and b are large, this results in a plan that filters a on ds = 
> 2018-01-01, but selects scans all partitions of table b.
> If the join in the view is changed to an inner join, the predicate gets 
> pushed down to a & b and the partitions are pruned as you'd expect.
> In practice, the view is fairly complex and contains a lot of business logic 
> we'd prefer not to replicate in pyspark if we can avoid it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23012) Support for predicate pushdown and partition pruning when left joining large Hive tables

2019-03-07 Thread Saurabh Santhosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787484#comment-16787484
 ] 

Saurabh Santhosh commented on SPARK-23012:
--

[~reks95] How did you fix this issue?

> Support for predicate pushdown and partition pruning when left joining large 
> Hive tables
> 
>
> Key: SPARK-23012
> URL: https://issues.apache.org/jira/browse/SPARK-23012
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.2.0
>Reporter: Rick Kramer
>Priority: Major
>
> We have a hive view which left outer joins several large, partitioned orc 
> hive tables together on date. When the view is used in a hive query, hive 
> pushes date predicates down into the joins and prunes the partitions for all 
> tables. When I use this view from pyspark, the predicate is only used to 
> prune the left-most table and all partitions from the additional tables are 
> selected.
> For example, consider two partitioned hive tables a & b joined in a view:
> create table a (
>a_val string
> )
> partitioned by (ds string)
> stored as orc;
> create table b (
>b_val string
> )
> partitioned by (ds string)
> stored as orc;
> create view example_view as
> select
> a_val
> , b_val
> , ds
> from a 
> left outer join b on b.ds = a.ds
> Then in pyspark you might try to query from the view filtering on ds:
> spark.table('example_view').filter(F.col('ds') == '2018-01-01')
> If table a and b are large, this results in a plan that filters a on ds = 
> 2018-01-01, but selects scans all partitions of table b.
> If the join in the view is changed to an inner join, the predicate gets 
> pushed down to a & b and the partitions are pruned as you'd expect.
> In practice, the view is fairly complex and contains a lot of business logic 
> we'd prefer not to replicate in pyspark if we can avoid it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23012) Support for predicate pushdown and partition pruning when left joining large Hive tables

2019-03-07 Thread Saurabh Santhosh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787483#comment-16787483
 ] 

Saurabh Santhosh commented on SPARK-23012:
--

[~yumwang] Any update on this? We are also having the same issue. Can you tell 
me in which version this is fixed?

Thanks

> Support for predicate pushdown and partition pruning when left joining large 
> Hive tables
> 
>
> Key: SPARK-23012
> URL: https://issues.apache.org/jira/browse/SPARK-23012
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.2.0
>Reporter: Rick Kramer
>Priority: Major
>
> We have a hive view which left outer joins several large, partitioned orc 
> hive tables together on date. When the view is used in a hive query, hive 
> pushes date predicates down into the joins and prunes the partitions for all 
> tables. When I use this view from pyspark, the predicate is only used to 
> prune the left-most table and all partitions from the additional tables are 
> selected.
> For example, consider two partitioned hive tables a & b joined in a view:
> create table a (
>a_val string
> )
> partitioned by (ds string)
> stored as orc;
> create table b (
>b_val string
> )
> partitioned by (ds string)
> stored as orc;
> create view example_view as
> select
> a_val
> , b_val
> , ds
> from a 
> left outer join b on b.ds = a.ds
> Then in pyspark you might try to query from the view filtering on ds:
> spark.table('example_view').filter(F.col('ds') == '2018-01-01')
> If table a and b are large, this results in a plan that filters a on ds = 
> 2018-01-01, but selects scans all partitions of table b.
> If the join in the view is changed to an inner join, the predicate gets 
> pushed down to a & b and the partitions are pruned as you'd expect.
> In practice, the view is fairly complex and contains a lot of business logic 
> we'd prefer not to replicate in pyspark if we can avoid it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14948) Exception when joining DataFrames derived form the same DataFrame

2016-04-27 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259681#comment-15259681
 ] 

Saurabh Santhosh commented on SPARK-14948:
--

Is https://issues.apache.org/jira/browse/SPARK-11072 going to resolve this 
issue?

> Exception when joining DataFrames derived form the same DataFrame
> -
>
> Key: SPARK-14948
> URL: https://issues.apache.org/jira/browse/SPARK-14948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Saurabh Santhosh
>
> h2. Spark Analyser is throwing the following exception in a specific scenario 
> :
> h2. Exception :
> org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing 
> from asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> JavaRDD rdd =
> 
> sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
>  "b")));
> DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
> StructType(fields));
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
> asd, F2 from t1");
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, 
> "t2");
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
> 
> DataFrame join = aliasedDf.join(df, 
> aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
> DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
> select.collect();
> {code}
> h2. Observations :
> * This issue is related to the Data Type of Fields of the initial Data 
> Frame.(If the Data Type is not String, it will work.)
> * It works fine if the data frame is registered as a temporary table and an 
> sql (select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11072) simplify self join handling

2016-04-27 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259653#comment-15259653
 ] 

Saurabh Santhosh commented on SPARK-11072:
--

Will this resolve https://issues.apache.org/jira/browse/SPARK-14948 ?
Can you please add a test case covering this scenario for future releases

> simplify self join handling
> ---
>
> Key: SPARK-11072
> URL: https://issues.apache.org/jira/browse/SPARK-11072
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>
> self-join is a diamond problem that confuse our analyzer. Our current 
> solution is creating new instances of leaf nodes in the right tree of join 
> node, and update all attribute reference there. Thus there is no diamond 
> anymore and problem fixed.
> However, our execution engine can handle diamond plan and we only need to 
> distinguish the output between left and right. So we can simplify the 
> self-join handling by introducing a new Plan `NewOutput` to give different 
> output attributes.
> The extra `NewOutput` layer is quite cheap and can be completely removed when 
> we have local nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10925) Exception when joining DataFrames

2016-04-27 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259650#comment-15259650
 ] 

Saurabh Santhosh commented on SPARK-10925:
--

[~smilegator]

Hi,
I have created another ticket with specific test case to reproduce this issue. 
https://issues.apache.org/jira/browse/SPARK-14948

> Exception when joining DataFrames
> -
>
> Key: SPARK-10925
> URL: https://issues.apache.org/jira/browse/SPARK-10925
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
> Environment: Tested with Spark 1.5.0 and Spark 1.5.1
>Reporter: Alexis Seigneurin
> Attachments: Photo 05-10-2015 14 31 16.jpg, TestCase2.scala
>
>
> I get an exception when joining a DataFrame with another DataFrame. The 
> second DataFrame was created by performing an aggregation on the first 
> DataFrame.
> My complete workflow is:
> # read the DataFrame
> # apply an UDF on column "name"
> # apply an UDF on column "surname"
> # apply an UDF on column "birthDate"
> # aggregate on "name" and re-join with the DF
> # aggregate on "surname" and re-join with the DF
> If I remove one step, the process completes normally.
> Here is the exception:
> {code}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved 
> attribute(s) surname#20 missing from id#0,birthDate#3,name#10,surname#7 in 
> operator !Project [id#0,birthDate#3,name#10,surname#20,UDF(birthDate#3) AS 
> birthDate_cleaned#8];
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:154)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:103)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:102)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:102)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:49)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:914)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:132)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
>   at org.apache.spark.sql.DataFrame.join(DataFrame.scala:553)
>   at org.apache.spark.sql.DataFrame.join(DataFrame.scala:520)
>   at TestCase2$.main(TestCase2.scala:51)
>   at TestCase2.main(TestCase2.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Updated] (SPARK-14948) Exception when joining DataFrames derived form the same DataFrame

2016-04-27 Thread Saurabh Santhosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Santhosh updated SPARK-14948:
-
Description: 
h2. Spark Analyser is throwing the following exception in a specific scenario :

h2. Exception :

org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing from 
asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)


h2. Code :

{code:title=SparkClient.java|borderStyle=solid}
StructField[] fields = new StructField[2];
fields[0] = new StructField("F1", DataTypes.StringType, true, 
Metadata.empty());
fields[1] = new StructField("F2", DataTypes.StringType, true, 
Metadata.empty());
JavaRDD rdd =

sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
 "b")));
DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
StructType(fields));
sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");

DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
asd, F2 from t1");

sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, "t2");
sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");

DataFrame join = aliasedDf.join(df, 
aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
select.collect();

{code}

h2. Observations :

* This issue is related to the Data Type of Fields of the initial Data 
Frame.(If the Data Type is not String, it will work.)
* It works fine if the data frame is registered as a temporary table and an sql 
(select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.

  was:
h2. Spark Analyser is throwing the following exception in a specific scenario :

h2. Exception :

org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing from 
asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)


h2. Code :

{code:title=SparkClient.java|borderStyle=solid}
StructField[] fields = new StructField[2];
fields[0] = new StructField("F1", DataTypes.StringType, true, 
Metadata.empty());
fields[1] = new StructField("F2", DataTypes.StringType, true, 
Metadata.empty());
JavaRDD rdd =

sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
 "b")));
DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
StructType(fields));
sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");

DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
asd, F2 from t1");

sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, "t2");
sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");

DataFrame join = aliasedDf.join(df, 
aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
select.collect();

{code}

h2. Observations :

* This issue is related to the Data Type of Fields of the initial Data 
Frame.(If the Data Type is not String, it will work.)
* It works fine if the data frame is registered as a temporary table and an sql 
(select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.


> Exception when joining DataFrames derived form the same DataFrame
> -
>
> Key: SPARK-14948
> URL: https://issues.apache.org/jira/browse/SPARK-14948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Saurabh Santhosh
>
> h2. Spark Analyser is throwing the following exception in a specific scenario 
> :
> h2. Exception :
> org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing 
> from asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> JavaRDD rdd =
> 
> sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
>  "b")));
> DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
> StructType(fields));
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = 

[jira] [Commented] (SPARK-14948) Exception when joining DataFrames derived form the same DataFrame

2016-04-27 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259646#comment-15259646
 ] 

Saurabh Santhosh commented on SPARK-14948:
--

This issue is related to https://issues.apache.org/jira/browse/SPARK-10925. But 
the other one is very generic and does not pin point the issue correctly

> Exception when joining DataFrames derived form the same DataFrame
> -
>
> Key: SPARK-14948
> URL: https://issues.apache.org/jira/browse/SPARK-14948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Saurabh Santhosh
>
> h2. Spark Analyser is throwing the following exception in a specific scenario 
> :
> h2. Exception :
> org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing 
> from asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> JavaRDD rdd =
> 
> sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
>  "b")));
> DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
> StructType(fields));
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
> asd, F2 from t1");
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, 
> "t2");
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
> 
> DataFrame join = aliasedDf.join(df, 
> aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
> DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
> select.collect();
> {code}
> h2. Observations :
> * This issue is related to the Data Type of Fields of the initial Data 
> Frame.(If the Data Type is not String, it will work.)
> * It works fine if the data frame is registered as a temporary table and an 
> sql (select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14948) Exception when joining DataFrames derived form the same DataFrame

2016-04-27 Thread Saurabh Santhosh (JIRA)
Saurabh Santhosh created SPARK-14948:


 Summary: Exception when joining DataFrames derived form the same 
DataFrame
 Key: SPARK-14948
 URL: https://issues.apache.org/jira/browse/SPARK-14948
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Saurabh Santhosh


h2. Spark Analyser is throwing the following exception in a specific scenario :

h2. Exception :

org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing from 
asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)


h2. Code :

{code:title=SparkClient.java|borderStyle=solid}
StructField[] fields = new StructField[2];
fields[0] = new StructField("F1", DataTypes.StringType, true, 
Metadata.empty());
fields[1] = new StructField("F2", DataTypes.StringType, true, 
Metadata.empty());
JavaRDD rdd =

sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
 "b")));
DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
StructType(fields));
sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");

DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
asd, F2 from t1");

sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, "t2");
sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");

DataFrame join = aliasedDf.join(df, 
aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
select.collect();

{code}

h2. Observations :

* This issue is related to the Data Type of Fields of the initial Data 
Frame.(If the Data Type is not String, it will work.)
* It works fine if the data frame is registered as a temporary table and an sql 
(select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11633) HiveContext throws TreeNode Exception : Failed to Copy Node

2015-11-17 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010228#comment-15010228
 ] 

Saurabh Santhosh commented on SPARK-11633:
--

[~smilegator] Tried with your change. Works fine. :D

> HiveContext throws TreeNode Exception : Failed to Copy Node
> ---
>
> Key: SPARK-11633
> URL: https://issues.apache.org/jira/browse/SPARK-11633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0, 1.5.1
>Reporter: Saurabh Santhosh
>Priority: Critical
>
> h2. HiveContext#sql is throwing the following exception in a specific 
> scenario :
> h2. Exception :
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Failed to copy node.
> Is otherCopyArgs specified correctly for LogicalRDD.
> Exception message: wrong number of arguments
> ctor: public org.apache.spark.sql.execution.LogicalRDD
> (scala.collection.Seq,org.apache.spark.rdd.RDD,org.apache.spark.sql.SQLContext)?
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> 
> JavaRDD rdd = 
> javaSparkContext.parallelize(Arrays.asList(RowFactory.create("", "")));
> DataFrame df = sparkHiveContext.createDataFrame(rdd, new StructType(fields));
> sparkHiveContext.registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkHiveContext.sql("select f1, F2 as F2 from t1");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t2");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t3");
> sparkHiveContext.sql("select a.F1 from t2 a inner join t3 b on a.F2=b.F2");
> {code}
> h2. Observations :
> * if F1(exact name of field) is used instead of f1, the code works correctly.
> * If alias is not used for F2, then also code works irrespective of case of 
> F1.
> * if Field F2 is not used in the final query also the code works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11633) HiveContext throws TreeNode Exception : Failed to Copy Node

2015-11-16 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006493#comment-15006493
 ] 

Saurabh Santhosh commented on SPARK-11633:
--

I Tried using version 1.4.1 as well as 1.5.1.
In Both cases, i got the same error.

Following are the dependencies used (used version 1.4.1, 1.5.1):

compile ('org.apache.spark:spark-core_2.10:1.5.1') { exclude group: 
"org.jboss.netty" }
compile ('org.apache.spark:spark-sql_2.10:1.5.1') { exclude group: 
"org.jboss.netty" }
compile ('org.apache.spark:spark-hive_2.10:1.5.1') {
exclude group: "org.jboss.netty"
exclude group: "org.mortbay.jetty"
  }

Do you want me to share anything else?

> HiveContext throws TreeNode Exception : Failed to Copy Node
> ---
>
> Key: SPARK-11633
> URL: https://issues.apache.org/jira/browse/SPARK-11633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0, 1.5.1
>Reporter: Saurabh Santhosh
>Priority: Critical
>
> h2. HiveContext#sql is throwing the following exception in a specific 
> scenario :
> h2. Exception :
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Failed to copy node.
> Is otherCopyArgs specified correctly for LogicalRDD.
> Exception message: wrong number of arguments
> ctor: public org.apache.spark.sql.execution.LogicalRDD
> (scala.collection.Seq,org.apache.spark.rdd.RDD,org.apache.spark.sql.SQLContext)?
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> 
> JavaRDD rdd = 
> javaSparkContext.parallelize(Arrays.asList(RowFactory.create("", "")));
> DataFrame df = sparkHiveContext.createDataFrame(rdd, new StructType(fields));
> sparkHiveContext.registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkHiveContext.sql("select f1, F2 as F2 from t1");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t2");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t3");
> sparkHiveContext.sql("select a.F1 from t2 a inner join t3 b on a.F2=b.F2");
> {code}
> h2. Observations :
> * if F1(exact name of field) is used instead of f1, the code works correctly.
> * If alias is not used for F2, then also code works irrespective of case of 
> F1.
> * if Field F2 is not used in the final query also the code works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11633) HiveContext throws TreeNode Exception : Failed to Copy Node

2015-11-11 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000328#comment-15000328
 ] 

Saurabh Santhosh commented on SPARK-11633:
--

{code:title=Stacktrace|borderStyle=solid}

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, tree:
LogicalRDD [F1#2,F2#3], MapPartitionsRDD[2] at createDataFrame at 
SparkClientTest.java:79

at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:346)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDown(QueryPlan.scala:96)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressions(QueryPlan.scala:64)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$3.applyOrElse(Analyzer.scala:333)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$3.applyOrElse(Analyzer.scala:332)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:299)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenUp(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:283)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:299)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenUp(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:283)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:299)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

[jira] [Created] (SPARK-11633) HiveContext throws TreeNode Exception : Failed to Copy Node

2015-11-10 Thread Saurabh Santhosh (JIRA)
Saurabh Santhosh created SPARK-11633:


 Summary: HiveContext throws TreeNode Exception : Failed to Copy 
Node
 Key: SPARK-11633
 URL: https://issues.apache.org/jira/browse/SPARK-11633
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
Reporter: Saurabh Santhosh
Priority: Critical


h2. HiveContext#sql is throwing the following exception in a specific scenario :

h2. Exception :

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
Failed to copy node.
Is otherCopyArgs specified correctly for LogicalRDD.
Exception message: wrong number of arguments
ctor: public org.apache.spark.sql.execution.LogicalRDD
(scala.collection.Seq,org.apache.spark.rdd.RDD,org.apache.spark.sql.SQLContext)?

h2. Code :

{code:title=SparkClient.java|borderStyle=solid}
StructField[] fields = new StructField[2];
fields[0] = new StructField("F1", DataTypes.StringType, true, Metadata.empty());
fields[1] = new StructField("F2", DataTypes.StringType, true, Metadata.empty());

JavaRDD rdd = 
javaSparkContext.parallelize(Arrays.asList(RowFactory.create("", "", 0)));

DataFrame df = sparkHiveContext.createDataFrame(rdd, new StructType(fields));
sparkHiveContext.registerDataFrameAsTable(df, "t1");

DataFrame aliasedDf = sparkHiveContext.sql("select f1, F2 as F2 from t1");

sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t2");
sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t3");

sparkHiveContext.sql("select a.F1 from t2 a inner join t3 b on a.F2=b.F2");

{code}

h2. Observations :

* if F1(exact name of field) is used instead of f1, the code works correctly.
* If alias is not used for F2, then also code works irrespective of case of F1.
* if Field F2 is not used in the final query also the code works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11633) HiveContext throws TreeNode Exception : Failed to Copy Node

2015-11-10 Thread Saurabh Santhosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Santhosh updated SPARK-11633:
-
Affects Version/s: 1.5.0

> HiveContext throws TreeNode Exception : Failed to Copy Node
> ---
>
> Key: SPARK-11633
> URL: https://issues.apache.org/jira/browse/SPARK-11633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0, 1.5.1
>Reporter: Saurabh Santhosh
>Priority: Critical
>
> h2. HiveContext#sql is throwing the following exception in a specific 
> scenario :
> h2. Exception :
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Failed to copy node.
> Is otherCopyArgs specified correctly for LogicalRDD.
> Exception message: wrong number of arguments
> ctor: public org.apache.spark.sql.execution.LogicalRDD
> (scala.collection.Seq,org.apache.spark.rdd.RDD,org.apache.spark.sql.SQLContext)?
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> 
> JavaRDD rdd = 
> javaSparkContext.parallelize(Arrays.asList(RowFactory.create("", "", 0)));
> DataFrame df = sparkHiveContext.createDataFrame(rdd, new StructType(fields));
> sparkHiveContext.registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkHiveContext.sql("select f1, F2 as F2 from t1");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t2");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t3");
> sparkHiveContext.sql("select a.F1 from t2 a inner join t3 b on a.F2=b.F2");
> {code}
> h2. Observations :
> * if F1(exact name of field) is used instead of f1, the code works correctly.
> * If alias is not used for F2, then also code works irrespective of case of 
> F1.
> * if Field F2 is not used in the final query also the code works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11633) HiveContext throws TreeNode Exception : Failed to Copy Node

2015-11-10 Thread Saurabh Santhosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Santhosh updated SPARK-11633:
-
Affects Version/s: 1.5.1

> HiveContext throws TreeNode Exception : Failed to Copy Node
> ---
>
> Key: SPARK-11633
> URL: https://issues.apache.org/jira/browse/SPARK-11633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0, 1.5.1
>Reporter: Saurabh Santhosh
>Priority: Critical
>
> h2. HiveContext#sql is throwing the following exception in a specific 
> scenario :
> h2. Exception :
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Failed to copy node.
> Is otherCopyArgs specified correctly for LogicalRDD.
> Exception message: wrong number of arguments
> ctor: public org.apache.spark.sql.execution.LogicalRDD
> (scala.collection.Seq,org.apache.spark.rdd.RDD,org.apache.spark.sql.SQLContext)?
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> 
> JavaRDD rdd = 
> javaSparkContext.parallelize(Arrays.asList(RowFactory.create("", "", 0)));
> DataFrame df = sparkHiveContext.createDataFrame(rdd, new StructType(fields));
> sparkHiveContext.registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkHiveContext.sql("select f1, F2 as F2 from t1");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t2");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t3");
> sparkHiveContext.sql("select a.F1 from t2 a inner join t3 b on a.F2=b.F2");
> {code}
> h2. Observations :
> * if F1(exact name of field) is used instead of f1, the code works correctly.
> * If alias is not used for F2, then also code works irrespective of case of 
> F1.
> * if Field F2 is not used in the final query also the code works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11633) HiveContext throws TreeNode Exception : Failed to Copy Node

2015-11-10 Thread Saurabh Santhosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Santhosh updated SPARK-11633:
-
Description: 
h2. HiveContext#sql is throwing the following exception in a specific scenario :

h2. Exception :

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
Failed to copy node.
Is otherCopyArgs specified correctly for LogicalRDD.
Exception message: wrong number of arguments
ctor: public org.apache.spark.sql.execution.LogicalRDD
(scala.collection.Seq,org.apache.spark.rdd.RDD,org.apache.spark.sql.SQLContext)?

h2. Code :

{code:title=SparkClient.java|borderStyle=solid}
StructField[] fields = new StructField[2];
fields[0] = new StructField("F1", DataTypes.StringType, true, Metadata.empty());
fields[1] = new StructField("F2", DataTypes.StringType, true, Metadata.empty());

JavaRDD rdd = 
javaSparkContext.parallelize(Arrays.asList(RowFactory.create("", "")));

DataFrame df = sparkHiveContext.createDataFrame(rdd, new StructType(fields));
sparkHiveContext.registerDataFrameAsTable(df, "t1");

DataFrame aliasedDf = sparkHiveContext.sql("select f1, F2 as F2 from t1");

sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t2");
sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t3");

sparkHiveContext.sql("select a.F1 from t2 a inner join t3 b on a.F2=b.F2");

{code}

h2. Observations :

* if F1(exact name of field) is used instead of f1, the code works correctly.
* If alias is not used for F2, then also code works irrespective of case of F1.
* if Field F2 is not used in the final query also the code works correctly.

  was:
h2. HiveContext#sql is throwing the following exception in a specific scenario :

h2. Exception :

Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
Failed to copy node.
Is otherCopyArgs specified correctly for LogicalRDD.
Exception message: wrong number of arguments
ctor: public org.apache.spark.sql.execution.LogicalRDD
(scala.collection.Seq,org.apache.spark.rdd.RDD,org.apache.spark.sql.SQLContext)?

h2. Code :

{code:title=SparkClient.java|borderStyle=solid}
StructField[] fields = new StructField[2];
fields[0] = new StructField("F1", DataTypes.StringType, true, Metadata.empty());
fields[1] = new StructField("F2", DataTypes.StringType, true, Metadata.empty());

JavaRDD rdd = 
javaSparkContext.parallelize(Arrays.asList(RowFactory.create("", "", 0)));

DataFrame df = sparkHiveContext.createDataFrame(rdd, new StructType(fields));
sparkHiveContext.registerDataFrameAsTable(df, "t1");

DataFrame aliasedDf = sparkHiveContext.sql("select f1, F2 as F2 from t1");

sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t2");
sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t3");

sparkHiveContext.sql("select a.F1 from t2 a inner join t3 b on a.F2=b.F2");

{code}

h2. Observations :

* if F1(exact name of field) is used instead of f1, the code works correctly.
* If alias is not used for F2, then also code works irrespective of case of F1.
* if Field F2 is not used in the final query also the code works correctly.


> HiveContext throws TreeNode Exception : Failed to Copy Node
> ---
>
> Key: SPARK-11633
> URL: https://issues.apache.org/jira/browse/SPARK-11633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0, 1.5.1
>Reporter: Saurabh Santhosh
>Priority: Critical
>
> h2. HiveContext#sql is throwing the following exception in a specific 
> scenario :
> h2. Exception :
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Failed to copy node.
> Is otherCopyArgs specified correctly for LogicalRDD.
> Exception message: wrong number of arguments
> ctor: public org.apache.spark.sql.execution.LogicalRDD
> (scala.collection.Seq,org.apache.spark.rdd.RDD,org.apache.spark.sql.SQLContext)?
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> 
> JavaRDD rdd = 
> javaSparkContext.parallelize(Arrays.asList(RowFactory.create("", "")));
> DataFrame df = sparkHiveContext.createDataFrame(rdd, new StructType(fields));
> sparkHiveContext.registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkHiveContext.sql("select f1, F2 as F2 from t1");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t2");
> sparkHiveContext.registerDataFrameAsTable(aliasedDf, "t3");
> sparkHiveContext.sql("select a.F1 from t2 a inner join t3 b on a.F2=b.F2");
> {code}
> h2. Observations :
> * if F1(exact name of field) is used instead of f1, the code works correctly.
> * If alias is not used for F2, then also code works 

[jira] [Commented] (SPARK-6988) Fix Spark SQL documentation for 1.3.x

2015-06-02 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568705#comment-14568705
 ] 

Saurabh Santhosh commented on SPARK-6988:
-

Hey,
Can someone update the Spark documentation as well. (For correct usage of Data 
Frames)

https://spark.apache.org/docs/latest/sql-programming-guide.html

Eg : 

DataFrame teenagers = sqlContext.sql(SELECT name FROM parquetFile WHERE age = 
13 AND age = 19);
ListString teenagerNames = teenagers.map(new FunctionRow, String() {
  public String call(Row row) {
return Name:  + row.getString(0);
  }
}).collect();

to change teenagers.map to teenagers.javaRDD().map

 Fix Spark SQL documentation for 1.3.x
 -

 Key: SPARK-6988
 URL: https://issues.apache.org/jira/browse/SPARK-6988
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0, 1.3.1
Reporter: Olivier Girardot
Assignee: Olivier Girardot
Priority: Minor
 Fix For: 1.3.2, 1.4.0


 There are a few glitches regarding the DataFrame API usage in Java.
 The most important one being how to map a DataFrame result, using the javaRDD 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6988) Fix Spark SQL documentation for 1.3.x

2015-06-02 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568704#comment-14568704
 ] 

Saurabh Santhosh commented on SPARK-6988:
-

Hey Can someone update the spark documentation as well.

https://spark.apache.org/docs/latest/sql-programming-guide.html

DataFrame teenagers = sqlContext.sql(SELECT name FROM parquetFile WHERE age = 
13 AND age = 19);
ListString teenagerNames = teenagers.map(new FunctionRow, String() {
  public String call(Row row) {
return Name:  + row.getString(0);
  }
}).collect();

to make it teenagers.javaRDD()

 Fix Spark SQL documentation for 1.3.x
 -

 Key: SPARK-6988
 URL: https://issues.apache.org/jira/browse/SPARK-6988
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0, 1.3.1
Reporter: Olivier Girardot
Assignee: Olivier Girardot
Priority: Minor
 Fix For: 1.3.2, 1.4.0


 There are a few glitches regarding the DataFrame API usage in Java.
 The most important one being how to map a DataFrame result, using the javaRDD 
 method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1823) ExternalAppendOnlyMap can still OOM if one key is very large

2015-03-03 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345080#comment-14345080
 ] 

Saurabh Santhosh commented on SPARK-1823:
-

Hey, what is the status on this issues? is there another ticket for it?

 ExternalAppendOnlyMap can still OOM if one key is very large
 

 Key: SPARK-1823
 URL: https://issues.apache.org/jira/browse/SPARK-1823
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Andrew Or

 If the values for one key do not collectively fit into memory, then the map 
 will still OOM when you merge the spilled contents back in.
 This is a problem especially for PySpark, since we hash the keys (Python 
 objects) before a shuffle, and there are only so many integers out there in 
 the world, so there could potentially be many collisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4811) Custom UDTFs not working in Spark SQL

2014-12-10 Thread Saurabh Santhosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Santhosh updated SPARK-4811:

Priority: Critical  (was: Major)

 Custom UDTFs not working in Spark SQL
 -

 Key: SPARK-4811
 URL: https://issues.apache.org/jira/browse/SPARK-4811
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0, 1.1.1
Reporter: Saurabh Santhosh
Priority: Critical
 Fix For: 1.2.0


 I am using the Thrift srever interface to Spark SQL and using beeline to 
 connect to it.
 I tried Spark SQL versions 1.1.0 and 1.1.1 and both are throwing the 
 following exception when using any custom UDTF.
 These are the steps i did :
 *Created a UDTF 'com.x.y.xxx'.*
 Registered the UDTF using following query : 
 *create temporary function xxx as 'com.x.y.xxx'*
 The registration went through without any errors. But when i tried executing 
 the UDTF i got the following error.
 *java.lang.ClassNotFoundException: xxx*
 Funny thing is that Its trying to load the function name instead of the 
 funtion class. The exception is at *line no: 81 in hiveudfs.scala*
 I have been at it for quite a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4811) Custom UDTFs not working in Spark SQL

2014-12-09 Thread Saurabh Santhosh (JIRA)
Saurabh Santhosh created SPARK-4811:
---

 Summary: Custom UDTFs not working in Spark SQL
 Key: SPARK-4811
 URL: https://issues.apache.org/jira/browse/SPARK-4811
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0, 1.1.1
Reporter: Saurabh Santhosh
 Fix For: 1.2.0


I am using the Thrift srever interface to Spark SQL and using beeline to 
connect to it.
I tried Spark SQL versions 1.1.0 and 1.1.1 and both are throwing the following 
exception when using any custom UDTF.

These are the steps i did :

*Created a UDTF 'com.x.y.xxx'.*

Registered the UDTF using following query : 
*create temporary function xxx as 'com.x.y.xxx'*

The registration went through without any errors. But when i tried executing 
the UDTF i got the following error.

*java.lang.ClassNotFoundException: xxx*

Funny thing is that Its trying to load the function name instead of the funtion 
class. The exception is at *line no: 81 in hiveudfs.scala*





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4811) Custom UDTFs not working in Spark SQL

2014-12-09 Thread Saurabh Santhosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Santhosh updated SPARK-4811:

Description: 
I am using the Thrift srever interface to Spark SQL and using beeline to 
connect to it.
I tried Spark SQL versions 1.1.0 and 1.1.1 and both are throwing the following 
exception when using any custom UDTF.

These are the steps i did :

*Created a UDTF 'com.x.y.xxx'.*

Registered the UDTF using following query : 
*create temporary function xxx as 'com.x.y.xxx'*

The registration went through without any errors. But when i tried executing 
the UDTF i got the following error.

*java.lang.ClassNotFoundException: xxx*

Funny thing is that Its trying to load the function name instead of the funtion 
class. The exception is at *line no: 81 in hiveudfs.scala*
I have been at it for quite a long time.


  was:
I am using the Thrift srever interface to Spark SQL and using beeline to 
connect to it.
I tried Spark SQL versions 1.1.0 and 1.1.1 and both are throwing the following 
exception when using any custom UDTF.

These are the steps i did :

*Created a UDTF 'com.x.y.xxx'.*

Registered the UDTF using following query : 
*create temporary function xxx as 'com.x.y.xxx'*

The registration went through without any errors. But when i tried executing 
the UDTF i got the following error.

*java.lang.ClassNotFoundException: xxx*

Funny thing is that Its trying to load the function name instead of the funtion 
class. The exception is at *line no: 81 in hiveudfs.scala*




 Custom UDTFs not working in Spark SQL
 -

 Key: SPARK-4811
 URL: https://issues.apache.org/jira/browse/SPARK-4811
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0, 1.1.1
Reporter: Saurabh Santhosh
 Fix For: 1.2.0


 I am using the Thrift srever interface to Spark SQL and using beeline to 
 connect to it.
 I tried Spark SQL versions 1.1.0 and 1.1.1 and both are throwing the 
 following exception when using any custom UDTF.
 These are the steps i did :
 *Created a UDTF 'com.x.y.xxx'.*
 Registered the UDTF using following query : 
 *create temporary function xxx as 'com.x.y.xxx'*
 The registration went through without any errors. But when i tried executing 
 the UDTF i got the following error.
 *java.lang.ClassNotFoundException: xxx*
 Funny thing is that Its trying to load the function name instead of the 
 funtion class. The exception is at *line no: 81 in hiveudfs.scala*
 I have been at it for quite a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3582) Spark SQL having issue with existing Hive UDFs which take Map as a parameter

2014-09-29 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14151511#comment-14151511
 ] 

Saurabh Santhosh commented on SPARK-3582:
-

Issue resolved by Pull request :
https://github.com/apache/spark/pull/2506

 Spark SQL having issue with existing Hive UDFs which take Map as a parameter
 

 Key: SPARK-3582
 URL: https://issues.apache.org/jira/browse/SPARK-3582
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Saurabh Santhosh
Assignee: Adrian Wang
 Fix For: 1.2.0


 I have a UDF with the following evaluate method :
 public Text evaluate(Text argument, MapText, Text params)
 And when i tried invoking this UDF, i was getting the following error.
 scala.MatchError: interface java.util.Map (of class java.lang.Class)
 at 
 org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:35)
 at 
 org.apache.spark.sql.hive.HiveFunctionRegistry.javaClassToDataType(hiveUdfs.scala:37)
 had a look at HiveInspectors.scala and was not able to see any resolver for 
 java.util.Map 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3582) Spark SQL having issue with existing Hive UDFs which take Map as a parameter

2014-09-17 Thread Saurabh Santhosh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh Santhosh updated SPARK-3582:

Summary: Spark SQL having issue with existing Hive UDFs which take Map as a 
parameter  (was: Spark SQL hving issue with existing Hive UDFs which take Map 
as a parameter)

 Spark SQL having issue with existing Hive UDFs which take Map as a parameter
 

 Key: SPARK-3582
 URL: https://issues.apache.org/jira/browse/SPARK-3582
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Saurabh Santhosh

 I have a UDF with the following evaluate method :
 public Text evaluate(Text argument, MapText, Text params)
 And when i tried invoking this UDF, i was getting the following error.
 scala.MatchError: interface java.util.Map (of class java.lang.Class)
 at 
 org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:35)
 at 
 org.apache.spark.sql.hive.HiveFunctionRegistry.javaClassToDataType(hiveUdfs.scala:37)
 had a look at HiveInspectors.scala and was not able to see any resolver for 
 java.util.Map 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-3582) Spark SQL hving issue with existing Hive UDFs which take Map as a parameter

2014-09-17 Thread Saurabh Santhosh (JIRA)
Saurabh Santhosh created SPARK-3582:
---

 Summary: Spark SQL hving issue with existing Hive UDFs which take 
Map as a parameter
 Key: SPARK-3582
 URL: https://issues.apache.org/jira/browse/SPARK-3582
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Saurabh Santhosh


I have a UDF with the following evaluate method :
public Text evaluate(Text argument, MapText, Text params)

And when i tried invoking this UDF, i was getting the following error.

scala.MatchError: interface java.util.Map (of class java.lang.Class)
at 
org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:35)
at 
org.apache.spark.sql.hive.HiveFunctionRegistry.javaClassToDataType(hiveUdfs.scala:37)

had a look at HiveInspectors.scala and was not able to see any resolver for 
java.util.Map 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3582) Spark SQL having issue with existing Hive UDFs which take Map as a parameter

2014-09-17 Thread Saurabh Santhosh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138559#comment-14138559
 ] 

Saurabh Santhosh commented on SPARK-3582:
-

When i changed the parameter to Object it works.
and the funny thing is that when i print the class of the runtime instance it 
shows 'java.util.HashMap'

 Spark SQL having issue with existing Hive UDFs which take Map as a parameter
 

 Key: SPARK-3582
 URL: https://issues.apache.org/jira/browse/SPARK-3582
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Saurabh Santhosh

 I have a UDF with the following evaluate method :
 public Text evaluate(Text argument, MapText, Text params)
 And when i tried invoking this UDF, i was getting the following error.
 scala.MatchError: interface java.util.Map (of class java.lang.Class)
 at 
 org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:35)
 at 
 org.apache.spark.sql.hive.HiveFunctionRegistry.javaClassToDataType(hiveUdfs.scala:37)
 had a look at HiveInspectors.scala and was not able to see any resolver for 
 java.util.Map 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org