[ 
https://issues.apache.org/jira/browse/SPARK-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603884#comment-14603884
 ] 

Sudhakar Thota commented on SPARK-4678:
---------------------------------------

Hi Tsuyoshi,

There was a  SQL framing issue with the above statement, otherwise it is 
working. 
Please look the
Please take a look at the demo file that I have built for this purpose. Please 
use it on your system to see if it works and let us know if you find any issues.

1. Create a text file.

bash-3.2$ cat /Users/sudhakarthota/dataspark/explode/words1.txt 
One beautiful spring morning, a merchant loaded his donkey with bags of salt to 
go to the market in order to sell them.
The merchant and his donkey were walking along together. They had not walked 
far when they reached a river on the road.
Unfortunately, the donkey slipped and fell into the river and noticed that the 
bags of salt loaded on his back became lighter.
There was nothing the merchant could do, except return home where he loaded his 
donkey with more bags of salt.
As they reached the slippery riverbank, now deliberately, the donkey fell into 
the river and wasted all the bags of salt on its back again.
The merchant quickly discovered the donkey’s trick. He then returned home again 
but re-loaded his donkey with bags of sponges.
The foolish, tricky donkey again set on its way. On reaching the river he again 
fell into the water. But instead of the load becoming lighter, it became 
heavier.
The merchant laughed at him and said: “You foolish donkey, your trick had been 
discovered, you should know that, those who are too clever sometimes over reach 
themselves.”
See more at: 
http://www.kidsworldfun.com/shortstories_amerchantandhisdonkey.php#sthash.sAD67ccC.dpuf

2. Create an external table around it.

CREATE EXTERNAL TABLE words(text String ) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 
'/Users/sudhakarthota/dataspark/explode';

3. Create table wordcount1 using the table for word counts.

spark-sql> CREATE TABLE wordcount1 AS SELECT tmp.word, count(1) AS count FROM ( 
select explode(split(lcase(text),' ')) as word from words)tmp  GROUP BY 
tmp.word;
15/06/26 18:28:37 WARN LazyStruct: Extra bytes detected at the end of the row! 
Ignoring similar problems.
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted file:///user/hive/warehouse/wordcount1
Time taken: 0.993 seconds

4. Verify the word counts.

spark-sql> select * from wordcount1;
http://www.kidsworldfun.com/shortstories_amerchantandhisdonkey.php#sthash.sad67ccc.dpuf
 1
when    1
reached 2
had     1
morning 1
river   1
more    1
walking 1
trick.  1
with    1
said:   1
not     1
then    1

> A SQL query with subquery fails with TreeNodeException
> ------------------------------------------------------
>
>                 Key: SPARK-4678
>                 URL: https://issues.apache.org/jira/browse/SPARK-4678
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.1.1
>            Reporter: Tsuyoshi Ozawa
>         Attachments: spark-4678-1.rtf
>
>
> {code}
> spark-sql> create external table if  NOT EXISTS randomText100GB(text string) 
> location 'hdfs:///user/ozawa/randomText100GB'; 
> spark-sql> CREATE TABLE wordcount AS
>          > SELECT word, count(1) AS count
>          > FROM (SELECT 
> EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' '))
>          > AS word FROM randomText100GB) words
>          > GROUP BY word;
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 
> (TID 25, hadoop-slave2.c.gcp-s
> amples.internal): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: word#5
>         
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
>         
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)
>         
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)
>         
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
>         
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
>         
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42)
>         
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)
>         
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)
>         
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>         scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>         scala.collection.AbstractTraversable.map(Traversable.scala:105)
>         
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:52)
>         
> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)
>         
> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)
>         
> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
>         
> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42)
>         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>         org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
>         
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>         
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to