[ https://issues.apache.org/jira/browse/SPARK-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603884#comment-14603884 ]
Sudhakar Thota commented on SPARK-4678: --------------------------------------- Hi Tsuyoshi, There was a SQL framing issue with the above statement, otherwise it is working. Please look the Please take a look at the demo file that I have built for this purpose. Please use it on your system to see if it works and let us know if you find any issues. 1. Create a text file. bash-3.2$ cat /Users/sudhakarthota/dataspark/explode/words1.txt One beautiful spring morning, a merchant loaded his donkey with bags of salt to go to the market in order to sell them. The merchant and his donkey were walking along together. They had not walked far when they reached a river on the road. Unfortunately, the donkey slipped and fell into the river and noticed that the bags of salt loaded on his back became lighter. There was nothing the merchant could do, except return home where he loaded his donkey with more bags of salt. As they reached the slippery riverbank, now deliberately, the donkey fell into the river and wasted all the bags of salt on its back again. The merchant quickly discovered the donkey’s trick. He then returned home again but re-loaded his donkey with bags of sponges. The foolish, tricky donkey again set on its way. On reaching the river he again fell into the water. But instead of the load becoming lighter, it became heavier. The merchant laughed at him and said: “You foolish donkey, your trick had been discovered, you should know that, those who are too clever sometimes over reach themselves.” See more at: http://www.kidsworldfun.com/shortstories_amerchantandhisdonkey.php#sthash.sAD67ccC.dpuf 2. Create an external table around it. CREATE EXTERNAL TABLE words(text String ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/Users/sudhakarthota/dataspark/explode'; 3. Create table wordcount1 using the table for word counts. spark-sql> CREATE TABLE wordcount1 AS SELECT tmp.word, count(1) AS count FROM ( select explode(split(lcase(text),' ')) as word from words)tmp GROUP BY tmp.word; 15/06/26 18:28:37 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems. rmr: DEPRECATED: Please use 'rm -r' instead. Deleted file:///user/hive/warehouse/wordcount1 Time taken: 0.993 seconds 4. Verify the word counts. spark-sql> select * from wordcount1; http://www.kidsworldfun.com/shortstories_amerchantandhisdonkey.php#sthash.sad67ccc.dpuf 1 when 1 reached 2 had 1 morning 1 river 1 more 1 walking 1 trick. 1 with 1 said: 1 not 1 then 1 > A SQL query with subquery fails with TreeNodeException > ------------------------------------------------------ > > Key: SPARK-4678 > URL: https://issues.apache.org/jira/browse/SPARK-4678 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.1.1 > Reporter: Tsuyoshi Ozawa > Attachments: spark-4678-1.rtf > > > {code} > spark-sql> create external table if NOT EXISTS randomText100GB(text string) > location 'hdfs:///user/ozawa/randomText100GB'; > spark-sql> CREATE TABLE wordcount AS > > SELECT word, count(1) AS count > > FROM (SELECT > EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' ')) > > AS word FROM randomText100GB) words > > GROUP BY word; > org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in > stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 > (TID 25, hadoop-slave2.c.gcp-s > amples.internal): > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: word#5 > > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) > > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43) > > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42) > > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165) > > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156) > > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42) > > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52) > > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52) > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > scala.collection.AbstractTraversable.map(Traversable.scala:105) > > org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:52) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106) > > org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106) > > org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) > > org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:54) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org