[jira] [Closed] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangyu closed SPARK-15044. --- > spark-sql will throw "input path does not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually > - > > Key: SPARK-15044 > URL: https://issues.apache.org/jira/browse/SPARK-15044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1, 2.0.0 >Reporter: huangyu > > spark-sql will throw "input path not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually.The > situation is as follows: > 1) Create a table "test". "create table test (n string) partitioned by (p > string)" > 2) Load some data into partition(p='1') > 3)Remove the path related to partition(p='1') of table test manually. "hadoop > fs -rmr /warehouse//test/p=1" > 4)Run spark sql, spark-sql -e "select n from test where p='1';" > Then it throws exception: > {code} > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > ./test/p=1 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > {code} > The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK > I think spark-sql should ignore the path, just like hive or it dose in early > versions, rather than throw an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419798#comment-15419798 ] huangyu commented on SPARK-15044: - Hi, I know it isn't Spark's fault. However I think maybe it's better to just log some error information, rather than throwing an exception. Because I always in the situation that, a table with many partitions(maybe one partition per hour), someone deletes the paths of many partitions(I really don't know why they do this, maybe there are some bugs in their program). Spark-sql can't work util I fix the hive metadata, but I can't run "alter table drop partition.." for all missing partitions(too many to run). So I have no choice but to catch the exception and rebuild spark. > spark-sql will throw "input path does not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually > - > > Key: SPARK-15044 > URL: https://issues.apache.org/jira/browse/SPARK-15044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1, 2.0.0 >Reporter: huangyu > > spark-sql will throw "input path not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually.The > situation is as follows: > 1) Create a table "test". "create table test (n string) partitioned by (p > string)" > 2) Load some data into partition(p='1') > 3)Remove the path related to partition(p='1') of table test manually. "hadoop > fs -rmr /warehouse//test/p=1" > 4)Run spark sql, spark-sql -e "select n from test where p='1';" > Then it throws exception: > {code} > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > ./test/p=1 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > {code} > The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK > I think spark-sql should ignore the path, just like hive or it dose in early > versions, rather than throw an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) -
[jira] [Comment Edited] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268252#comment-15268252 ] huangyu edited comment on SPARK-15044 at 5/3/16 7:26 AM: - Oh I know, thank you. Of course, If dropping the partition after removing the path, it will work. But in my job I need to deal with many tables with thousands of partitions from different sources, and the above problem really bothers me. What's more many spark-sql scripts work well in version 1.4.0, fails in 1.6.1. was (Author: huang_yuu): Oh I know, thank you. Of course, If dropping the partition after removing the path, it will work. But in my job I need to deal with many tables with thousands of partitions from different sources, and the above problem really bothers me. > spark-sql will throw "input path does not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually > - > > Key: SPARK-15044 > URL: https://issues.apache.org/jira/browse/SPARK-15044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: huangyu > > spark-sql will throw "input path not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually.The > situation is as follows: > 1) Create a table "test". "create table test (n string) partitioned by (p > string)" > 2) Load some data into partition(p='1') > 3)Remove the path related to partition(p='1') of table test manually. "hadoop > fs -rmr /warehouse//test/p=1" > 4)Run spark sql, spark-sql -e "select n from test where p='1';" > Then it throws exception: > {code} > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > ./test/p=1 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > {code} > The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK > I think spark-sql should ignore the path, just like hive or it dose in early > versions, rather than throw an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --
[jira] [Comment Edited] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268252#comment-15268252 ] huangyu edited comment on SPARK-15044 at 5/3/16 7:27 AM: - Oh I know, thank you. Of course, If dropping the partition after removing the path, it will work. But in my job I need to deal with many tables with thousands of partitions from different sources, and the above problem really bothers me. What's more many spark-sql scripts work well in version 1.4.0, fail in 1.6.1. was (Author: huang_yuu): Oh I know, thank you. Of course, If dropping the partition after removing the path, it will work. But in my job I need to deal with many tables with thousands of partitions from different sources, and the above problem really bothers me. What's more many spark-sql scripts work well in version 1.4.0, fails in 1.6.1. > spark-sql will throw "input path does not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually > - > > Key: SPARK-15044 > URL: https://issues.apache.org/jira/browse/SPARK-15044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: huangyu > > spark-sql will throw "input path not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually.The > situation is as follows: > 1) Create a table "test". "create table test (n string) partitioned by (p > string)" > 2) Load some data into partition(p='1') > 3)Remove the path related to partition(p='1') of table test manually. "hadoop > fs -rmr /warehouse//test/p=1" > 4)Run spark sql, spark-sql -e "select n from test where p='1';" > Then it throws exception: > {code} > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > ./test/p=1 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > {code} > The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK > I think spark-sql should ignore the path, just like hive or it dose in early > versions, rather than throw an exception. -- T
[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268252#comment-15268252 ] huangyu commented on SPARK-15044: - Oh I know, thank you. Of course, If dropping the partition after removing the path, it will work. But in my job I need to deal with many tables with thousands of partitions from different sources, and the above problem really bothers me. > spark-sql will throw "input path does not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually > - > > Key: SPARK-15044 > URL: https://issues.apache.org/jira/browse/SPARK-15044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: huangyu > > spark-sql will throw "input path not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually.The > situation is as follows: > 1) Create a table "test". "create table test (n string) partitioned by (p > string)" > 2) Load some data into partition(p='1') > 3)Remove the path related to partition(p='1') of table test manually. "hadoop > fs -rmr /warehouse//test/p=1" > 4)Run spark sql, spark-sql -e "select n from test where p='1';" > Then it throws exception: > {code} > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > ./test/p=1 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > {code} > The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK > I think spark-sql should ignore the path, just like hive or it dose in early > versions, rather than throw an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267974#comment-15267974 ] huangyu edited comment on SPARK-15044 at 5/3/16 2:51 AM: - I removed the path by hadoop command,"hadoop fs -rmr /warehouse//test/p=1", not hive. That is to say the partition exists in hive table, but the related path was removed manually, and spark 1.6.1 would throw exception. was (Author: huang_yuu): I removed the path by hadoop command,"hadoop fs -rmr /warehouse//test/p=1", not hive. That is to say the partition exists in hive table, but the related path was removed manually, and spark of version 1.6.1 would throw exception. > spark-sql will throw "input path does not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually > - > > Key: SPARK-15044 > URL: https://issues.apache.org/jira/browse/SPARK-15044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: huangyu > > spark-sql will throw "input path not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually.The > situation is as follows: > 1) Create a table "test". "create table test (n string) partitioned by (p > string)" > 2) Load some data into partition(p='1') > 3)Remove the path related to partition(p='1') of table test manually. "hadoop > fs -rmr /warehouse//test/p=1" > 4)Run spark sql, spark-sql -e "select n from test where p='1';" > Then it throws exception: > {code} > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > ./test/p=1 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > {code} > The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK > I think spark-sql should ignore the path, just like hive or it dose in early > versions, rather than throw an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark
[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267978#comment-15267978 ] huangyu commented on SPARK-15044: - Maybe you can try in Spark 1.6.1, because when I tried in Spark 1.6.1 it throw the exception, but in Spark 1.4.0 it did not. > spark-sql will throw "input path does not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually > - > > Key: SPARK-15044 > URL: https://issues.apache.org/jira/browse/SPARK-15044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: huangyu > > spark-sql will throw "input path not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually.The > situation is as follows: > 1) Create a table "test". "create table test (n string) partitioned by (p > string)" > 2) Load some data into partition(p='1') > 3)Remove the path related to partition(p='1') of table test manually. "hadoop > fs -rmr /warehouse//test/p=1" > 4)Run spark sql, spark-sql -e "select n from test where p='1';" > Then it throws exception: > {code} > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > ./test/p=1 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > {code} > The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK > I think spark-sql should ignore the path, just like hive or it dose in early > versions, rather than throw an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267974#comment-15267974 ] huangyu commented on SPARK-15044: - I removed the path by hadoop command,"hadoop fs -rmr /warehouse//test/p=1", not hive. That is to say the partition exists in hive table, but the related path was removed manually, and spark of version 1.6.1 would throw exception. > spark-sql will throw "input path does not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually > - > > Key: SPARK-15044 > URL: https://issues.apache.org/jira/browse/SPARK-15044 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: huangyu > > spark-sql will throw "input path not exist" exception if it handles a > partition which exists in hive table, but the path is removed manually.The > situation is as follows: > 1) Create a table "test". "create table test (n string) partitioned by (p > string)" > 2) Load some data into partition(p='1') > 3)Remove the path related to partition(p='1') of table test manually. "hadoop > fs -rmr /warehouse//test/p=1" > 4)Run spark sql, spark-sql -e "select n from test where p='1';" > Then it throws exception: > {code} > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > ./test/p=1 > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > {code} > The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK > I think spark-sql should ignore the path, just like hive or it dose in early > versions, rather than throw an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangyu updated SPARK-15044: Description: spark-sql will throw "input path not exist" exception if it handles a partition which exists in hive table, but the path is removed manually.The situation is as follows: 1) Create a table "test". "create table test (n string) partitioned by (p string)" 2) Load some data into partition(p='1') 3)Remove the path related to partition(p='1') of table test manually. "hadoop fs -rmr /warehouse//test/p=1" 4)Run spark sql, spark-sql -e "select n from test where p='1';" Then it throws exception: {code} org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: ./test/p=1 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) {code} The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK I think spark-sql should ignore the path, just like hive or it dose in early versions, rather than throw an exception. was: spark-sql will throw "input path not exist" exception if it handles a partition which exists in hive table, but the path is removed manually.The situation is as follows: 1) Create a table "test". "create table test (n string) partitioned by (p string)" 2) Load some data into partition(p='1') 3)Remove the path related to partition(p='1') of table test manually. "hadoop fs -rmr /warehouse//test/p=1" 4)Run spark sql, spark-sql -e "select n from test where p='1';" Then it throws exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: ./test/p=1 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rd
[jira] [Updated] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
[ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangyu updated SPARK-15044: Description: spark-sql will throw "input path not exist" exception if it handles a partition which exists in hive table, but the path is removed manually.The situation is as follows: 1) Create a table "test". "create table test (n string) partitioned by (p string)" 2) Load some data into partition(p='1') 3)Remove the path related to partition(p='1') of table test manually. "hadoop fs -rmr /warehouse//test/p=1" 4)Run spark sql, spark-sql -e "select n from test where p='1';" Then it throws exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: ./test/p=1 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK I think spark-sql should ignore the path, just like hive or it dose in early versions, rather than throw an exception. was: spark-sql will throw "input path not exist" exception if it handles a partition which exists in hive table, but the path is removed manually.The situation is as follows: 1) Create a table "test". "create table test (n string) partitioned by (p string)" 2) Load some data into partition(p='1') 3)Remove the path related to partition(p='1') of table test manually. "hadoop fs -rmr /warehouse//test/p=1" 4)Run spark sql, spark-sql -e "select n from test where p='1';" Then it throws exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: ./test/p=1 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfu
[jira] [Created] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
huangyu created SPARK-15044: --- Summary: spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually Key: SPARK-15044 URL: https://issues.apache.org/jira/browse/SPARK-15044 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1 Reporter: huangyu spark-sql will throw "input path not exist" exception if it handles a partition which exists in hive table, but the path is removed manually.The situation is as follows: 1) Create a table "test". "create table test (n string) partitioned by (p string)" 2) Load some data into partition(p='1') 3)Remove the path related to partition(p='1') of table test manually. "hadoop fs -rmr /warehouse//test/p=1" 4)Run spark sql, spark-sql -e "select n from test where p='1';" Then it throws exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: ./test/p=1 at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) The situation is in spark 1.6.1, if I use spark 1.4.0, It is OK I think spark-sql should ignore the path, just like hive or it dose in early versions, rather than throw an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-13652) TransportClient.sendRpcSync returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangyu closed SPARK-13652. --- This issue has been fixed by Shixiong Zhu > TransportClient.sendRpcSync returns wrong results > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: huangyu >Assignee: Shixiong Zhu > Fix For: 1.6.2, 2.0.0 > > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-13652) TransportClient.sendRpcSync returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179157#comment-15179157 ] huangyu edited comment on SPARK-13652 at 3/4/16 3:09 AM: - I think the doc is about fetching stream rather than sendRpc was (Author: huang_yuu): I think the docs are about fetching stream rather than sendRpc > TransportClient.sendRpcSync returns wrong results > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-13652) TransportClient.sendRpcSync returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179157#comment-15179157 ] huangyu edited comment on SPARK-13652 at 3/4/16 2:53 AM: - I think the docs are about fetching stream rather than sendRpc was (Author: huang_yuu): I think it is about fetching stream rather than sendRpc > TransportClient.sendRpcSync returns wrong results > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-13652) TransportClient.sendRpcSync returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179186#comment-15179186 ] huangyu edited comment on SPARK-13652 at 3/4/16 2:39 AM: - I used your patch and it worked, great!! was (Author: huang_yuu): I use your patch and it worked, great!! > TransportClient.sendRpcSync returns wrong results > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13652) TransportClient.sendRpcSync returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179186#comment-15179186 ] huangyu commented on SPARK-13652: - I use your patch and it worked, great!! > TransportClient.sendRpcSync returns wrong results > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13652) TransportClient.sendRpcSync returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179157#comment-15179157 ] huangyu commented on SPARK-13652: - I think it is about fetching stream rather than sendRpc > TransportClient.sendRpcSync returns wrong results > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-13652) spark netty network issue
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178100#comment-15178100 ] huangyu edited comment on SPARK-13652 at 3/3/16 4:55 PM: - Does it mean that for a TransportClient I can't send rpc messages in parallel. Then how can I use it in multiple threads,because i see the comment "Concurrency: thread safe and can be called from multiple threads."? Can you give me a example? Thank you very much was (Author: huang_yuu): Does it mean that for a TransportClient, i can't send rpc messages in parallel. then how can i use it in multiple threads,because i see the comment "Concurrency: thread safe and can be called from multiple threads."? can you give me a example? thank you very much > spark netty network issue > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.1, 1.5.2, 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13652) spark netty network issue
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178100#comment-15178100 ] huangyu commented on SPARK-13652: - Does it mean that for a TransportClient, i can't send rpc messages in parallel. then how can i use it in multiple threads,because i see the comment "Concurrency: thread safe and can be called from multiple threads."? can you give me a example? thank you very much > spark netty network issue > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.1, 1.5.2, 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13652) spark netty network issue
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangyu updated SPARK-13652: Summary: spark netty network issue (was: spark netty network issu) > spark netty network issue > - > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.1, 1.5.2, 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13652) spark netty network issu
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangyu updated SPARK-13652: Attachment: Test.java RankHandler.java > spark netty network issu > > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.1, 1.5.2, 1.6.0 >Reporter: huangyu > Attachments: RankHandler.java, Test.java > > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > {code} > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(), true); > final TransportClientFactory clientFactory = > context.createClientFactory(); > List ts = new ArrayList<>(); > for (int i = 0; i < 10; i++) { > ts.add(new Thread(new Runnable() { > @Override > public void run() { > for (int j = 0; j < 1000; j++) { > try { > ByteBuf buf = Unpooled.buffer(8); > buf.writeLong((long) j); > ByteBuffer byteBuffer = > clientFactory.createClient("localhost", 8081). > sendRpcSync(buf.nioBuffer(), > Long.MAX_VALUE); > long response = byteBuffer.getLong(); > if (response != j) { > System.err.println("send:" + j + ",response:" > + response); > } > } catch (IOException e) { > e.printStackTrace(); > } > } > } > })); > ts.get(i).start(); > } > for (Thread t : ts) { > t.join(); > } > server.close(); > } > public class RankHandler extends RpcHandler { > private final Logger logger = LoggerFactory.getLogger(RankHandler.class); > private final StreamManager streamManager; > public RankHandler() { > this.streamManager = new OneForOneStreamManager(); > } > @Override > public void receive(TransportClient client, ByteBuffer msg, > RpcResponseCallback callback) { > callback.onSuccess(msg); > } > @Override > public StreamManager getStreamManager() { > return streamManager; > } > } > {code} > it will print as below > send:221,response:222 > send:233,response:234 > send:312,response:313 > send:358,response:359 > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13652) spark netty network issu
huangyu created SPARK-13652: --- Summary: spark netty network issu Key: SPARK-13652 URL: https://issues.apache.org/jira/browse/SPARK-13652 Project: Spark Issue Type: Bug Affects Versions: 1.6.0, 1.5.2, 1.5.1 Reporter: huangyu TransportClient is not thread safe and if it is called from multiple threads, the messages can't be encoded and decoded correctly. Below is my code,and it will print wrong message. public static void main(String[] args) throws IOException, InterruptedException { TransportServer server = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new RankHandler()). createServer(8081, new LinkedList()); TransportContext context = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new NoOpRpcHandler(), true); final TransportClientFactory clientFactory = context.createClientFactory(); List ts = new ArrayList<>(); for (int i = 0; i < 10; i++) { ts.add(new Thread(new Runnable() { @Override public void run() { for (int j = 0; j < 1000; j++) { try { ByteBuf buf = Unpooled.buffer(8); buf.writeLong((long) j); ByteBuffer byteBuffer = clientFactory.createClient("localhost", 8081). sendRpcSync(buf.nioBuffer(), Long.MAX_VALUE); long response = byteBuffer.getLong(); if (response != j) { System.err.println("send:" + j + ",response:" + response); } } catch (IOException e) { e.printStackTrace(); } } } })); ts.get(i).start(); } for (Thread t : ts) { t.join(); } server.close(); } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13652) spark netty network issu
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangyu updated SPARK-13652: Description: TransportClient is not thread safe and if it is called from multiple threads, the messages can't be encoded and decoded correctly. Below is my code,and it will print wrong message. public static void main(String[] args) throws IOException, InterruptedException { TransportServer server = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new RankHandler()). createServer(8081, new LinkedList()); TransportContext context = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new NoOpRpcHandler(), true); final TransportClientFactory clientFactory = context.createClientFactory(); List ts = new ArrayList<>(); for (int i = 0; i < 10; i++) { ts.add(new Thread(new Runnable() { @Override public void run() { for (int j = 0; j < 1000; j++) { try { ByteBuf buf = Unpooled.buffer(8); buf.writeLong((long) j); ByteBuffer byteBuffer = clientFactory.createClient("localhost", 8081). sendRpcSync(buf.nioBuffer(), Long.MAX_VALUE); long response = byteBuffer.getLong(); if (response != j) { System.err.println("send:" + j + ",response:" + response); } } catch (IOException e) { e.printStackTrace(); } } } })); ts.get(i).start(); } for (Thread t : ts) { t.join(); } server.close(); } public class RankHandler extends RpcHandler { private final Logger logger = LoggerFactory.getLogger(RankHandler.class); private final StreamManager streamManager; public RankHandler() { this.streamManager = new OneForOneStreamManager(); } @Override public void receive(TransportClient client, ByteBuffer msg, RpcResponseCallback callback) { callback.onSuccess(msg); } @Override public StreamManager getStreamManager() { return streamManager; } } it will print as below send:221,response:222 send:233,response:234 send:312,response:313 send:358,response:359 ... was: TransportClient is not thread safe and if it is called from multiple threads, the messages can't be encoded and decoded correctly. Below is my code,and it will print wrong message. public static void main(String[] args) throws IOException, InterruptedException { TransportServer server = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new RankHandler()). createServer(8081, new LinkedList()); TransportContext context = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new NoOpRpcHandler(), true); final TransportClientFactory clientFactory = context.createClientFactory(); List ts = new ArrayList<>(); for (int i = 0; i < 10; i++) { ts.add(new Thread(new Runnable() { @Override public void run() { for (int j = 0; j < 1000; j++) { try { ByteBuf buf = Unpooled.buffer(8); buf.writeLong((long) j); ByteBuffer byteBuffer = clientFactory.createClient("localhost", 8081). sendRpcSync(buf.nioBuffer(), Long.MAX_VALUE); long response = byteBuffer.getLong(); if (response != j) { System.err.println("send:" + j + ",response:" + response); } } catch (IOException e) { e.printStackTrace(); } } } })); ts.get(i).start(); } for (Thread t : ts) { t.join(); } server.close(); } it will print as below send:221,response:222 send:233,response:234 send:312,response:313 send:358,response:359 ... > spark netty network issu > > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.1, 1.5.2, 1.6.0 >Reporter: huangyu > > TransportClient is not thread s
[jira] [Updated] (SPARK-13652) spark netty network issu
[ https://issues.apache.org/jira/browse/SPARK-13652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangyu updated SPARK-13652: Description: TransportClient is not thread safe and if it is called from multiple threads, the messages can't be encoded and decoded correctly. Below is my code,and it will print wrong message. public static void main(String[] args) throws IOException, InterruptedException { TransportServer server = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new RankHandler()). createServer(8081, new LinkedList()); TransportContext context = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new NoOpRpcHandler(), true); final TransportClientFactory clientFactory = context.createClientFactory(); List ts = new ArrayList<>(); for (int i = 0; i < 10; i++) { ts.add(new Thread(new Runnable() { @Override public void run() { for (int j = 0; j < 1000; j++) { try { ByteBuf buf = Unpooled.buffer(8); buf.writeLong((long) j); ByteBuffer byteBuffer = clientFactory.createClient("localhost", 8081). sendRpcSync(buf.nioBuffer(), Long.MAX_VALUE); long response = byteBuffer.getLong(); if (response != j) { System.err.println("send:" + j + ",response:" + response); } } catch (IOException e) { e.printStackTrace(); } } } })); ts.get(i).start(); } for (Thread t : ts) { t.join(); } server.close(); } it will print as below send:221,response:222 send:233,response:234 send:312,response:313 send:358,response:359 ... was: TransportClient is not thread safe and if it is called from multiple threads, the messages can't be encoded and decoded correctly. Below is my code,and it will print wrong message. public static void main(String[] args) throws IOException, InterruptedException { TransportServer server = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new RankHandler()). createServer(8081, new LinkedList()); TransportContext context = new TransportContext(new TransportConf("test", new MapConfigProvider(new HashMap())), new NoOpRpcHandler(), true); final TransportClientFactory clientFactory = context.createClientFactory(); List ts = new ArrayList<>(); for (int i = 0; i < 10; i++) { ts.add(new Thread(new Runnable() { @Override public void run() { for (int j = 0; j < 1000; j++) { try { ByteBuf buf = Unpooled.buffer(8); buf.writeLong((long) j); ByteBuffer byteBuffer = clientFactory.createClient("localhost", 8081). sendRpcSync(buf.nioBuffer(), Long.MAX_VALUE); long response = byteBuffer.getLong(); if (response != j) { System.err.println("send:" + j + ",response:" + response); } } catch (IOException e) { e.printStackTrace(); } } } })); ts.get(i).start(); } for (Thread t : ts) { t.join(); } server.close(); } > spark netty network issu > > > Key: SPARK-13652 > URL: https://issues.apache.org/jira/browse/SPARK-13652 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.1, 1.5.2, 1.6.0 >Reporter: huangyu > > TransportClient is not thread safe and if it is called from multiple threads, > the messages can't be encoded and decoded correctly. Below is my code,and it > will print wrong message. > public static void main(String[] args) throws IOException, > InterruptedException { > TransportServer server = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > RankHandler()). > createServer(8081, new > LinkedList()); > TransportContext context = new TransportContext(new > TransportConf("test", > new MapConfigProvider(new HashMap())), new > NoOpRpcHandler(