[jira] [Closed] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-28 Thread huangyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangyu closed SPARK-15044.
---

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -
>
> Key: SPARK-15044
> URL: https://issues.apache.org/jira/browse/SPARK-15044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr /warehouse//test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ./test/p=1
> at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-08-12 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419798#comment-15419798
 ] 

huangyu commented on SPARK-15044:
-

Hi, I know it isn't Spark's fault. However I think maybe it's better to just 
log some error information, rather than throwing an exception. Because I always 
in the situation that, a table with many partitions(maybe one partition per 
hour),  someone deletes the paths of many partitions(I really don't know why 
they do this, maybe there are some bugs in their program). Spark-sql can't work 
util I fix the hive metadata, but I can't run "alter table drop partition.." 
for all missing partitions(too many to run).  So I have no choice but to catch 
the exception and rebuild spark.

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -
>
> Key: SPARK-15044
> URL: https://issues.apache.org/jira/browse/SPARK-15044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 2.0.0
>Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr /warehouse//test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ./test/p=1
> at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Comment Edited] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268252#comment-15268252
 ] 

huangyu edited comment on SPARK-15044 at 5/3/16 7:26 AM:
-

Oh I know, thank you. Of course, If dropping the partition after removing the 
path, it will work. But in my job I need to deal with many tables with 
thousands of partitions from different sources, and the above problem really 
bothers me. What's more many spark-sql scripts work well in version 1.4.0,  
fails in 1.6.1.


was (Author: huang_yuu):
Oh I know, thank you. Of course, If dropping the partition after removing the 
path, it will work. But in my job I need to deal with many tables with 
thousands of partitions from different sources, and the above problem really 
bothers me.

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -
>
> Key: SPARK-15044
> URL: https://issues.apache.org/jira/browse/SPARK-15044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr /warehouse//test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ./test/p=1
> at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

--

[jira] [Comment Edited] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268252#comment-15268252
 ] 

huangyu edited comment on SPARK-15044 at 5/3/16 7:27 AM:
-

Oh I know, thank you. Of course, If dropping the partition after removing the 
path, it will work. But in my job I need to deal with many tables with 
thousands of partitions from different sources, and the above problem really 
bothers me. What's more many spark-sql scripts work well in version 1.4.0,  
fail in 1.6.1.


was (Author: huang_yuu):
Oh I know, thank you. Of course, If dropping the partition after removing the 
path, it will work. But in my job I need to deal with many tables with 
thousands of partitions from different sources, and the above problem really 
bothers me. What's more many spark-sql scripts work well in version 1.4.0,  
fails in 1.6.1.

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -
>
> Key: SPARK-15044
> URL: https://issues.apache.org/jira/browse/SPARK-15044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr /warehouse//test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ./test/p=1
> at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
T

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268252#comment-15268252
 ] 

huangyu commented on SPARK-15044:
-

Oh I know, thank you. Of course, If dropping the partition after removing the 
path, it will work. But in my job I need to deal with many tables with 
thousands of partitions from different sources, and the above problem really 
bothers me.

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -
>
> Key: SPARK-15044
> URL: https://issues.apache.org/jira/browse/SPARK-15044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr /warehouse//test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ./test/p=1
> at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-02 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267974#comment-15267974
 ] 

huangyu edited comment on SPARK-15044 at 5/3/16 2:51 AM:
-

I removed the path by hadoop command,"hadoop fs -rmr 
/warehouse//test/p=1", not hive. That is to say the partition exists in 
hive table, but the related path was removed manually, and spark 1.6.1 would 
throw exception.


was (Author: huang_yuu):
I removed the path by hadoop command,"hadoop fs -rmr 
/warehouse//test/p=1", not hive. That is to say the partition exists in 
hive table, but the related path was removed manually, and spark of version 
1.6.1 would throw exception.

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -
>
> Key: SPARK-15044
> URL: https://issues.apache.org/jira/browse/SPARK-15044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr /warehouse//test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ./test/p=1
> at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark

[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-02 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267978#comment-15267978
 ] 

huangyu commented on SPARK-15044:
-

Maybe you can try in Spark 1.6.1, because when I tried in Spark 1.6.1 it throw 
the exception, but in Spark 1.4.0 it did not.

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -
>
> Key: SPARK-15044
> URL: https://issues.apache.org/jira/browse/SPARK-15044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr /warehouse//test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ./test/p=1
> at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-02 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267974#comment-15267974
 ] 

huangyu commented on SPARK-15044:
-

I removed the path by hadoop command,"hadoop fs -rmr 
/warehouse//test/p=1", not hive. That is to say the partition exists in 
hive table, but the related path was removed manually, and spark of version 
1.6.1 would throw exception.

> spark-sql will throw "input path does not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually
> -
>
> Key: SPARK-15044
> URL: https://issues.apache.org/jira/browse/SPARK-15044
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a 
> partition which exists in hive table, but the path is removed manually.The 
> situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p 
> string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop 
> fs -rmr /warehouse//test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> ./test/p=1
> at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early 
> versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-01 Thread huangyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangyu updated SPARK-15044:

Description: 
spark-sql will throw "input path not exist" exception if it handles a partition 
which exists in hive table, but the path is removed manually.The situation is 
as follows:

1) Create a table "test". "create table test (n string) partitioned by (p 
string)"
2) Load some data into partition(p='1')
3)Remove the path related to partition(p='1') of table test manually. "hadoop 
fs -rmr /warehouse//test/p=1"
4)Run spark sql, spark-sql -e "select n from test where p='1';"

Then it throws exception:

{code}
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
./test/p=1
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
{code}
The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
I think spark-sql should ignore the path, just like hive or it dose in early 
versions, rather than throw an exception.

  was:
spark-sql will throw "input path not exist" exception if it handles a partition 
which exists in hive table, but the path is removed manually.The situation is 
as follows:

1) Create a table "test". "create table test (n string) partitioned by (p 
string)"
2) Load some data into partition(p='1')
3)Remove the path related to partition(p='1') of table test manually. "hadoop 
fs -rmr /warehouse//test/p=1"
4)Run spark sql, spark-sql -e "select n from test where p='1';"

Then it throws exception:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
./test/p=1
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rd

[jira] [Updated] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-01 Thread huangyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangyu updated SPARK-15044:

Description: 
spark-sql will throw "input path not exist" exception if it handles a partition 
which exists in hive table, but the path is removed manually.The situation is 
as follows:

1) Create a table "test". "create table test (n string) partitioned by (p 
string)"
2) Load some data into partition(p='1')
3)Remove the path related to partition(p='1') of table test manually. "hadoop 
fs -rmr /warehouse//test/p=1"
4)Run spark sql, spark-sql -e "select n from test where p='1';"

Then it throws exception:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
./test/p=1
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)

The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
I think spark-sql should ignore the path, just like hive or it dose in early 
versions, rather than throw an exception.

  was:
spark-sql will throw "input path not exist" exception if it handles a partition 
which exists in hive table, but the path is removed manually.The situation is 
as follows:

1) Create a table "test". "create table test (n string) partitioned by (p 
string)"
2) Load some data into partition(p='1')
3)Remove the path related to partition(p='1') of table test manually. "hadoop 
fs -rmr /warehouse//test/p=1"
4)Run spark sql, spark-sql -e "select n from test where p='1';"

Then it throws exception:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
./test/p=1
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfu

[jira] [Created] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually

2016-05-01 Thread huangyu (JIRA)
huangyu created SPARK-15044:
---

 Summary: spark-sql will throw "input path does not exist" 
exception if it handles a partition which exists in hive table, but the path is 
removed manually
 Key: SPARK-15044
 URL: https://issues.apache.org/jira/browse/SPARK-15044
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1
Reporter: huangyu


spark-sql will throw "input path not exist" exception if it handles a partition 
which exists in hive table, but the path is removed manually.The situation is 
as follows:

1) Create a table "test". "create table test (n string) partitioned by (p 
string)"
2) Load some data into partition(p='1')
3)Remove the path related to partition(p='1') of table test manually. "hadoop 
fs -rmr /warehouse//test/p=1"
4)Run spark sql, spark-sql -e "select n from test where p='1';"

Then it throws exception:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
./test/p=1
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)

The situation is in spark 1.6.1, if I use spark 1.4.0, It is OK
I think spark-sql should ignore the path, just like hive or it dose in early 
versions, rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-13652) TransportClient.sendRpcSync returns wrong results

2016-03-04 Thread huangyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangyu closed SPARK-13652.
---

This issue has been fixed by Shixiong Zhu

> TransportClient.sendRpcSync returns wrong results
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: huangyu
>Assignee: Shixiong Zhu
> Fix For: 1.6.2, 2.0.0
>
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13652) TransportClient.sendRpcSync returns wrong results

2016-03-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179157#comment-15179157
 ] 

huangyu edited comment on SPARK-13652 at 3/4/16 3:09 AM:
-

I think the doc is about fetching stream rather than sendRpc


was (Author: huang_yuu):
I think the docs are about fetching stream rather than sendRpc

> TransportClient.sendRpcSync returns wrong results
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13652) TransportClient.sendRpcSync returns wrong results

2016-03-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179157#comment-15179157
 ] 

huangyu edited comment on SPARK-13652 at 3/4/16 2:53 AM:
-

I think the docs are about fetching stream rather than sendRpc


was (Author: huang_yuu):
I think it is about fetching stream rather than sendRpc

> TransportClient.sendRpcSync returns wrong results
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13652) TransportClient.sendRpcSync returns wrong results

2016-03-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179186#comment-15179186
 ] 

huangyu edited comment on SPARK-13652 at 3/4/16 2:39 AM:
-

I used your patch and it worked, great!!


was (Author: huang_yuu):
I use your patch and it worked, great!!

> TransportClient.sendRpcSync returns wrong results
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13652) TransportClient.sendRpcSync returns wrong results

2016-03-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179186#comment-15179186
 ] 

huangyu commented on SPARK-13652:
-

I use your patch and it worked, great!!

> TransportClient.sendRpcSync returns wrong results
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13652) TransportClient.sendRpcSync returns wrong results

2016-03-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179157#comment-15179157
 ] 

huangyu commented on SPARK-13652:
-

I think it is about fetching stream rather than sendRpc

> TransportClient.sendRpcSync returns wrong results
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13652) spark netty network issue

2016-03-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178100#comment-15178100
 ] 

huangyu edited comment on SPARK-13652 at 3/3/16 4:55 PM:
-

Does it mean that for a TransportClient I can't send rpc messages in parallel. 
Then how can I use it in multiple threads,because i see the comment 
"Concurrency: thread safe and can be called from multiple threads."? Can you 
give me a example? Thank you very much


was (Author: huang_yuu):
Does it mean that for a TransportClient, i can't send rpc messages in parallel. 
then how can i use it in multiple threads,because i see the comment 
"Concurrency: thread safe and can be called from multiple threads."? can you 
give me a example? thank you very much

> spark netty network issue
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.5.2, 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13652) spark netty network issue

2016-03-03 Thread huangyu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178100#comment-15178100
 ] 

huangyu commented on SPARK-13652:
-

Does it mean that for a TransportClient, i can't send rpc messages in parallel. 
then how can i use it in multiple threads,because i see the comment 
"Concurrency: thread safe and can be called from multiple threads."? can you 
give me a example? thank you very much

> spark netty network issue
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.5.2, 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13652) spark netty network issue

2016-03-03 Thread huangyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangyu updated SPARK-13652:

Summary: spark netty network issue  (was: spark netty network issu)

> spark netty network issue
> -
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.5.2, 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13652) spark netty network issu

2016-03-03 Thread huangyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangyu updated SPARK-13652:

Attachment: Test.java
RankHandler.java

> spark netty network issu
> 
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.5.2, 1.6.0
>Reporter: huangyu
> Attachments: RankHandler.java, Test.java
>
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> {code}
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(), true);
> final TransportClientFactory clientFactory = 
> context.createClientFactory();
> List ts = new ArrayList<>();
> for (int i = 0; i < 10; i++) {
> ts.add(new Thread(new Runnable() {
> @Override
> public void run() {
> for (int j = 0; j < 1000; j++) {
> try {
> ByteBuf buf = Unpooled.buffer(8);
> buf.writeLong((long) j);
> ByteBuffer byteBuffer = 
> clientFactory.createClient("localhost", 8081).
> sendRpcSync(buf.nioBuffer(), 
> Long.MAX_VALUE);
> long response = byteBuffer.getLong();
> if (response != j) {
> System.err.println("send:" + j + ",response:" 
> + response);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> }));
> ts.get(i).start();
> }
> for (Thread t : ts) {
> t.join();
> }
> server.close();
> }
> public class RankHandler extends RpcHandler {
> private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
> private final StreamManager streamManager;
> public RankHandler() {
> this.streamManager = new OneForOneStreamManager();
> }
> @Override
> public void receive(TransportClient client, ByteBuffer msg, 
> RpcResponseCallback callback) {
> callback.onSuccess(msg);
> }
> @Override
> public StreamManager getStreamManager() {
> return streamManager;
> }
> }
> {code}
> it will print as below
> send:221,response:222
> send:233,response:234
> send:312,response:313
> send:358,response:359
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13652) spark netty network issu

2016-03-03 Thread huangyu (JIRA)
huangyu created SPARK-13652:
---

 Summary: spark netty network issu
 Key: SPARK-13652
 URL: https://issues.apache.org/jira/browse/SPARK-13652
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.0, 1.5.2, 1.5.1
Reporter: huangyu


TransportClient is not thread safe and if it is called from multiple threads, 
the messages can't be encoded and decoded correctly. Below is my code,and it 
will print wrong message.

public static void main(String[] args) throws IOException, InterruptedException 
{

TransportServer server = new TransportContext(new TransportConf("test",
new MapConfigProvider(new HashMap())), new 
RankHandler()).
createServer(8081, new LinkedList());

TransportContext context = new TransportContext(new 
TransportConf("test",
new MapConfigProvider(new HashMap())), new 
NoOpRpcHandler(), true);
final TransportClientFactory clientFactory = 
context.createClientFactory();
List ts = new ArrayList<>();
for (int i = 0; i < 10; i++) {
ts.add(new Thread(new Runnable() {
@Override
public void run() {
for (int j = 0; j < 1000; j++) {
try {
ByteBuf buf = Unpooled.buffer(8);
buf.writeLong((long) j);
ByteBuffer byteBuffer = 
clientFactory.createClient("localhost", 8081).
sendRpcSync(buf.nioBuffer(), 
Long.MAX_VALUE);

long response = byteBuffer.getLong();
if (response != j) {
System.err.println("send:" + j + ",response:" + 
response);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}));
ts.get(i).start();
}
for (Thread t : ts) {
t.join();
}
server.close();

}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13652) spark netty network issu

2016-03-03 Thread huangyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangyu updated SPARK-13652:

Description: 
TransportClient is not thread safe and if it is called from multiple threads, 
the messages can't be encoded and decoded correctly. Below is my code,and it 
will print wrong message.

public static void main(String[] args) throws IOException, InterruptedException 
{

TransportServer server = new TransportContext(new TransportConf("test",
new MapConfigProvider(new HashMap())), new 
RankHandler()).
createServer(8081, new LinkedList());

TransportContext context = new TransportContext(new 
TransportConf("test",
new MapConfigProvider(new HashMap())), new 
NoOpRpcHandler(), true);
final TransportClientFactory clientFactory = 
context.createClientFactory();
List ts = new ArrayList<>();
for (int i = 0; i < 10; i++) {
ts.add(new Thread(new Runnable() {
@Override
public void run() {
for (int j = 0; j < 1000; j++) {
try {
ByteBuf buf = Unpooled.buffer(8);
buf.writeLong((long) j);
ByteBuffer byteBuffer = 
clientFactory.createClient("localhost", 8081).
sendRpcSync(buf.nioBuffer(), 
Long.MAX_VALUE);

long response = byteBuffer.getLong();
if (response != j) {
System.err.println("send:" + j + ",response:" + 
response);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}));
ts.get(i).start();
}
for (Thread t : ts) {
t.join();
}
server.close();

}


public class RankHandler extends RpcHandler {

private final Logger logger = LoggerFactory.getLogger(RankHandler.class);
private final StreamManager streamManager;

public RankHandler() {
this.streamManager = new OneForOneStreamManager();
}

@Override
public void receive(TransportClient client, ByteBuffer msg, 
RpcResponseCallback callback) {
callback.onSuccess(msg);
}

@Override
public StreamManager getStreamManager() {
return streamManager;
}
}

it will print as below
send:221,response:222
send:233,response:234
send:312,response:313
send:358,response:359
...

  was:
TransportClient is not thread safe and if it is called from multiple threads, 
the messages can't be encoded and decoded correctly. Below is my code,and it 
will print wrong message.

public static void main(String[] args) throws IOException, InterruptedException 
{

TransportServer server = new TransportContext(new TransportConf("test",
new MapConfigProvider(new HashMap())), new 
RankHandler()).
createServer(8081, new LinkedList());

TransportContext context = new TransportContext(new 
TransportConf("test",
new MapConfigProvider(new HashMap())), new 
NoOpRpcHandler(), true);
final TransportClientFactory clientFactory = 
context.createClientFactory();
List ts = new ArrayList<>();
for (int i = 0; i < 10; i++) {
ts.add(new Thread(new Runnable() {
@Override
public void run() {
for (int j = 0; j < 1000; j++) {
try {
ByteBuf buf = Unpooled.buffer(8);
buf.writeLong((long) j);
ByteBuffer byteBuffer = 
clientFactory.createClient("localhost", 8081).
sendRpcSync(buf.nioBuffer(), 
Long.MAX_VALUE);

long response = byteBuffer.getLong();
if (response != j) {
System.err.println("send:" + j + ",response:" + 
response);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}));
ts.get(i).start();
}
for (Thread t : ts) {
t.join();
}
server.close();

}



it will print as below
send:221,response:222
send:233,response:234
send:312,response:313
send:358,response:359
...


> spark netty network issu
> 
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.5.2, 1.6.0
>Reporter: huangyu
>
> TransportClient is not thread s

[jira] [Updated] (SPARK-13652) spark netty network issu

2016-03-03 Thread huangyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangyu updated SPARK-13652:

Description: 
TransportClient is not thread safe and if it is called from multiple threads, 
the messages can't be encoded and decoded correctly. Below is my code,and it 
will print wrong message.

public static void main(String[] args) throws IOException, InterruptedException 
{

TransportServer server = new TransportContext(new TransportConf("test",
new MapConfigProvider(new HashMap())), new 
RankHandler()).
createServer(8081, new LinkedList());

TransportContext context = new TransportContext(new 
TransportConf("test",
new MapConfigProvider(new HashMap())), new 
NoOpRpcHandler(), true);
final TransportClientFactory clientFactory = 
context.createClientFactory();
List ts = new ArrayList<>();
for (int i = 0; i < 10; i++) {
ts.add(new Thread(new Runnable() {
@Override
public void run() {
for (int j = 0; j < 1000; j++) {
try {
ByteBuf buf = Unpooled.buffer(8);
buf.writeLong((long) j);
ByteBuffer byteBuffer = 
clientFactory.createClient("localhost", 8081).
sendRpcSync(buf.nioBuffer(), 
Long.MAX_VALUE);

long response = byteBuffer.getLong();
if (response != j) {
System.err.println("send:" + j + ",response:" + 
response);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}));
ts.get(i).start();
}
for (Thread t : ts) {
t.join();
}
server.close();

}



it will print as below
send:221,response:222
send:233,response:234
send:312,response:313
send:358,response:359
...

  was:
TransportClient is not thread safe and if it is called from multiple threads, 
the messages can't be encoded and decoded correctly. Below is my code,and it 
will print wrong message.

public static void main(String[] args) throws IOException, InterruptedException 
{

TransportServer server = new TransportContext(new TransportConf("test",
new MapConfigProvider(new HashMap())), new 
RankHandler()).
createServer(8081, new LinkedList());

TransportContext context = new TransportContext(new 
TransportConf("test",
new MapConfigProvider(new HashMap())), new 
NoOpRpcHandler(), true);
final TransportClientFactory clientFactory = 
context.createClientFactory();
List ts = new ArrayList<>();
for (int i = 0; i < 10; i++) {
ts.add(new Thread(new Runnable() {
@Override
public void run() {
for (int j = 0; j < 1000; j++) {
try {
ByteBuf buf = Unpooled.buffer(8);
buf.writeLong((long) j);
ByteBuffer byteBuffer = 
clientFactory.createClient("localhost", 8081).
sendRpcSync(buf.nioBuffer(), 
Long.MAX_VALUE);

long response = byteBuffer.getLong();
if (response != j) {
System.err.println("send:" + j + ",response:" + 
response);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
}));
ts.get(i).start();
}
for (Thread t : ts) {
t.join();
}
server.close();

}


> spark netty network issu
> 
>
> Key: SPARK-13652
> URL: https://issues.apache.org/jira/browse/SPARK-13652
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.5.2, 1.6.0
>Reporter: huangyu
>
> TransportClient is not thread safe and if it is called from multiple threads, 
> the messages can't be encoded and decoded correctly. Below is my code,and it 
> will print wrong message.
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> TransportServer server = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> RankHandler()).
> createServer(8081, new 
> LinkedList());
> TransportContext context = new TransportContext(new 
> TransportConf("test",
> new MapConfigProvider(new HashMap())), new 
> NoOpRpcHandler(