[jira] [Commented] (SPARK-20712) [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has length greater than 4000 bytes
[ https://issues.apache.org/jira/browse/SPARK-20712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797988#comment-16797988 ] Kris Geusebroek commented on SPARK-20712: - [~yumwang] Also in scala it will fail after issuing a overwrite: withTable("t1") { val cols = Range(1, 2000).map(i => s"id as very_long_column_name_id$i") spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.saveAsTable("t1") spark.table("t1").show # all good still spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.mode("overwrite").saveAsTable("t1") spark.table("t1").show # fails } > [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has > length greater than 4000 bytes > --- > > Key: SPARK-20712 > URL: https://issues.apache.org/jira/browse/SPARK-20712 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.1.2, 2.2.0, 2.3.0 >Reporter: Maciej Bryński >Priority: Critical > > Hi, > I have following issue. > I'm trying to read a table from hive when one of the column is nested so it's > schema has length longer than 4000 bytes. > Everything worked on Spark 2.0.2. On 2.1.1 I'm getting Exception: > {code} > >> spark.read.table("SOME_TABLE") > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark-2.1.1/python/pyspark/sql/readwriter.py", line 259, in table > return self._df(self._jreader.table(tableName)) > File > "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line > 1133, in __call__ > File "/opt/spark-2.1.1/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", > line 319, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o71.table. > : org.apache.spark.SparkException: Cannot recognize hive type string: > SOME_VERY_LONG_FIELD_TYPE > at > org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:361) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:359) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:279) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:226) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:225) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:268) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:359) > at > org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118) > at >
[jira] [Comment Edited] (SPARK-20712) [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has length greater than 4000 bytes
[ https://issues.apache.org/jira/browse/SPARK-20712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797988#comment-16797988 ] Kris Geusebroek edited comment on SPARK-20712 at 3/21/19 10:53 AM: --- [~yumwang] Also in scala it will fail after issuing a overwrite: {code:java} withTable("t1") { val cols = Range(1, 2000).map(i => s"id as very_long_column_name_id$i") spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.saveAsTable("t1") spark.table("t1").show // all good spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.mode("overwrite").saveAsTable("t1") spark.table("t1").show // fails } {code} was (Author: krisgeus): [~yumwang] Also in scala it will fail after issuing a overwrite: ``` withTable("t1") { val cols = Range(1, 2000).map(i => s"id as very_long_column_name_id$i") spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.saveAsTable("t1") spark.table("t1").show // all good spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.mode("overwrite").saveAsTable("t1") spark.table("t1").show // fails } ``` > [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has > length greater than 4000 bytes > --- > > Key: SPARK-20712 > URL: https://issues.apache.org/jira/browse/SPARK-20712 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.1.2, 2.2.0, 2.3.0 >Reporter: Maciej Bryński >Priority: Critical > > Hi, > I have following issue. > I'm trying to read a table from hive when one of the column is nested so it's > schema has length longer than 4000 bytes. > Everything worked on Spark 2.0.2. On 2.1.1 I'm getting Exception: > {code} > >> spark.read.table("SOME_TABLE") > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark-2.1.1/python/pyspark/sql/readwriter.py", line 259, in table > return self._df(self._jreader.table(tableName)) > File > "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line > 1133, in __call__ > File "/opt/spark-2.1.1/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", > line 319, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o71.table. > : org.apache.spark.SparkException: Cannot recognize hive type string: > SOME_VERY_LONG_FIELD_TYPE > at > org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:361) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:359) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:279) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:226) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:225) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:268) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:359) >
[jira] [Comment Edited] (SPARK-20712) [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has length greater than 4000 bytes
[ https://issues.apache.org/jira/browse/SPARK-20712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797988#comment-16797988 ] Kris Geusebroek edited comment on SPARK-20712 at 3/21/19 10:51 AM: --- [~yumwang] Also in scala it will fail after issuing a overwrite: ``` withTable("t1") { val cols = Range(1, 2000).map(i => s"id as very_long_column_name_id$i") spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.saveAsTable("t1") spark.table("t1").show // all good spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.mode("overwrite").saveAsTable("t1") spark.table("t1").show // fails } ``` was (Author: krisgeus): [~yumwang] Also in scala it will fail after issuing a overwrite: withTable("t1") { val cols = Range(1, 2000).map(i => s"id as very_long_column_name_id$i") spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.saveAsTable("t1") spark.table("t1").show # all good still spark.range(1).selectExpr(cols: _*).selectExpr("struct(*) as nested").write.mode("overwrite").saveAsTable("t1") spark.table("t1").show # fails } > [SPARK 2.1 REGRESSION][SQL] Spark can't read Hive table when column type has > length greater than 4000 bytes > --- > > Key: SPARK-20712 > URL: https://issues.apache.org/jira/browse/SPARK-20712 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.1.2, 2.2.0, 2.3.0 >Reporter: Maciej Bryński >Priority: Critical > > Hi, > I have following issue. > I'm trying to read a table from hive when one of the column is nested so it's > schema has length longer than 4000 bytes. > Everything worked on Spark 2.0.2. On 2.1.1 I'm getting Exception: > {code} > >> spark.read.table("SOME_TABLE") > Traceback (most recent call last): > File "", line 1, in > File "/opt/spark-2.1.1/python/pyspark/sql/readwriter.py", line 259, in table > return self._df(self._jreader.table(tableName)) > File > "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line > 1133, in __call__ > File "/opt/spark-2.1.1/python/pyspark/sql/utils.py", line 63, in deco > return f(*a, **kw) > File "/opt/spark-2.1.1/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", > line 319, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o71.table. > : org.apache.spark.SparkException: Cannot recognize hive type string: > SOME_VERY_LONG_FIELD_TYPE > at > org.apache.spark.sql.hive.client.HiveClientImpl.org$apache$spark$sql$hive$client$HiveClientImpl$$fromHiveColumn(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:365) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:365) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11.apply(HiveClientImpl.scala:361) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:361) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:359) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:279) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:226) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:225) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:268) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:359) > at >
[jira] [Commented] (SPARK-24965) Spark SQL fails when reading a partitioned hive table with different formats per partition
[ https://issues.apache.org/jira/browse/SPARK-24965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561278#comment-16561278 ] Kris Geusebroek commented on SPARK-24965: - PR: https://github.com/apache/spark/pull/21893 > Spark SQL fails when reading a partitioned hive table with different formats > per partition > -- > > Key: SPARK-24965 > URL: https://issues.apache.org/jira/browse/SPARK-24965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Kris Geusebroek >Priority: Major > Labels: pull-request-available > > When a hive parquet partitioned table contains a partition with a different > format (avro for example) the select * fails with a read exception (avro file > is not a parquet file) > Selecting in hive acts as expected. > To support this a new sql syntax needed to be supported also: > * ALTER TABLE SET FILEFORMAT > This is included in the same PR since the unittest needs this to setup the > testdata. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24965) Spark SQL fails when reading a partitioned hive table with different formats per partition
Kris Geusebroek created SPARK-24965: --- Summary: Spark SQL fails when reading a partitioned hive table with different formats per partition Key: SPARK-24965 URL: https://issues.apache.org/jira/browse/SPARK-24965 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: Kris Geusebroek When a hive parquet partitioned table contains a partition with a different format (avro for example) the select * fails with a read exception (avro file is not a parquet file) Selecting in hive acts as expected. To support this a new sql syntax needed to be supported also: * ALTER TABLE SET FILEFORMAT This is included in the same PR since the unittest needs this to setup the testdata. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org