[jira] [Updated] (SPARK-14948) Exception when joining DataFrames derived form the same DataFrame

Saurabh Santhosh (JIRA) Tue, 26 Apr 2016 23:46:39 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Saurabh Santhosh updated SPARK-14948:
-------------------------------------
    Description: 
h2. Spark Analyser is throwing the following exception in a specific scenario :

h2. Exception :

org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing from 
asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)


h2. Code :

{code:title=SparkClient.java|borderStyle=solid}
    StructField[] fields = new StructField[2];
    fields[0] = new StructField("F1", DataTypes.StringType, true, 
Metadata.empty());
    fields[1] = new StructField("F2", DataTypes.StringType, true, 
Metadata.empty());
    JavaRDD<Row> rdd =
        
sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
 "b")));
    DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
StructType(fields));
    sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");

    DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
asd, F2 from t1");

    sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, "t2");
    sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
    
    DataFrame join = aliasedDf.join(df, 
aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
    DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
    select.collect();

{code}

h2. Observations :

* This issue is related to the Data Type of Fields of the initial Data 
Frame.(If the Data Type is not String, it will work.)
* It works fine if the data frame is registered as a temporary table and an sql 
(select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.

  was:
h2. Spark Analyser is throwing the following exception in a specific scenario :

h2. Exception :

org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing from 
asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)


h2. Code :

{code:title=SparkClient.java|borderStyle=solid}
StructField[] fields = new StructField[2];
    fields[0] = new StructField("F1", DataTypes.StringType, true, 
Metadata.empty());
    fields[1] = new StructField("F2", DataTypes.StringType, true, 
Metadata.empty());
    JavaRDD<Row> rdd =
        
sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
 "b")));
    DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
StructType(fields));
    sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");

    DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
asd, F2 from t1");

    sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, "t2");
    sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
    
    DataFrame join = aliasedDf.join(df, 
aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
    DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
    select.collect();

{code}

h2. Observations :

* This issue is related to the Data Type of Fields of the initial Data 
Frame.(If the Data Type is not String, it will work.)
* It works fine if the data frame is registered as a temporary table and an sql 
(select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.


> Exception when joining DataFrames derived form the same DataFrame
> -----------------------------------------------------------------
>
>                 Key: SPARK-14948
>                 URL: https://issues.apache.org/jira/browse/SPARK-14948
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Saurabh Santhosh
>
> h2. Spark Analyser is throwing the following exception in a specific scenario 
> :
> h2. Exception :
> org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing 
> from asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
>       at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
>     StructField[] fields = new StructField[2];
>     fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
>     fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
>     JavaRDD<Row> rdd =
>         
> sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
>  "b")));
>     DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
> StructType(fields));
>     sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");
>     DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
> asd, F2 from t1");
>     sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, 
> "t2");
>     sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
>     
>     DataFrame join = aliasedDf.join(df, 
> aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
>     DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
>     select.collect();
> {code}
> h2. Observations :
> * This issue is related to the Data Type of Fields of the initial Data 
> Frame.(If the Data Type is not String, it will work.)
> * It works fine if the data frame is registered as a temporary table and an 
> sql (select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14948) Exception when joining DataFrames derived form the same DataFrame

Reply via email to