[ https://issues.apache.org/jira/browse/SPARK-29186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937464#comment-16937464 ]
Tarun Khaneja commented on SPARK-29186: --------------------------------------- [~hyukjin.kwon] I have cleaned the code by adding screenshot. Please let me know if anything else is require! > SubqueryAlias name value is null in Spark 2.4.3 Logical plan. > ------------------------------------------------------------- > > Key: SPARK-29186 > URL: https://issues.apache.org/jira/browse/SPARK-29186 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.3 > Environment: I have tried this on AWS Glue with Spark 2.4.3 > and on windows 10 with 2.4.4 > at both of them facing same issue > Reporter: Tarun Khaneja > Priority: Blocker > Fix For: 2.2.1 > > Attachments: image-2019-09-25-12-17-53-552.png, > image-2019-09-25-12-21-52-136.png > > > I am writing a program to analyze sql query. So I am using Spark logical > plan.I am writing a program to analyze sql query. So I am using Spark logical > plan. > Below is the code which I am using > > {code:java} > object QueryAnalyzer > { > val LOG = LoggerFactory.getLogger(this.getClass) //Spark Conf > > val conf = new > SparkConf().setMaster("local[2]").setAppName("LocalEdlExecutor") > //Spark Context > val sc = new SparkContext(conf) > //sql Context > val sqlContext = new SQLContext(sc) > > //Spark Session > val sparkSession = SparkSession > .builder() > .appName("Spark User Data") .config("spark.app.name", "LocalEdl") > .getOrCreate() > def main(args: Array[String]) > { > var inputDfColumns = Map[String,List[String]]() > val dfSession = sparkSession.read.format("csv"). option("header", > "true"). option("inferschema", "true"). option("delimiter", > ",").option("decoding", "utf8").option("multiline", true) > > var oDF = dfSession. load("C:\\Users\\tarun.khaneja\\data\\order.csv") > > println("smaple data in oDF====>") > > oDF.show() > var cusDF = dfSession. > load("C:\\Users\\tarun.khaneja\\data\\customer.csv") > println("smaple data in cusDF====>") cusDF.show() > oDF.createOrReplaceTempView("orderTempView") > cusDF.createOrReplaceTempView("customerTempView") > > //get input columns from all dataframe > inputDfColumns += > ("orderTempView"->oDF.columns.toList) > > inputDfColumns += > ("customerTempView"->cusDF.columns.toList) > > val res = sqlContext.sql("""select OID, max(MID+CID) as MID_new,ROW_NUMBER() > OVER ( > ORDER BY CID) as rn from (select OID_1 as OID, > CID_1 as CID, OID_1+CID_1 as MID from (select min(ot.OrderID) as OID_1, > ct.CustomerID as CID_1 from orderTempView as ot inner join customerTempView > as ct on ot.CustomerID = ct.CustomerID group by > CID_1)) group by OID,CID""") > println(res.show(false)) > val analyzedPlan = res.queryExecution.analyzed > println(analyzedPlan.prettyJson) > } > {code} > > Now problem is, with *Spark 2.2.1*, I am getting below json. where I have > SubqueryAlias which provide important information of alias name for table > which we used in query, as shown below. > !image-2019-09-25-12-17-53-552.png! > > But with Spark 2.4, I am getting SubqueryAlias name as null. As shown below > in json screenshot > > !image-2019-09-25-12-21-52-136.png! > > So, I am not sure if it is bug in Spark 2.4 because of which I am getting > name as null in SubquerAlias. > Or if it is not bug then how can I get relation between alias name and real > table name. > Any idea on this? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org