Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

Jacek Laskowski Mon, 01 Oct 2018 13:25:14 -0700

Hi,

OK. Sorry for the noise. I don't know why it started working, but I cannot
reproduce it anymore. Sorry for a false alarm (but I could promise it
didn't work and I changed nothing). Back to work...


Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Mon, Oct 1, 2018 at 8:45 AM Jyoti Ranjan Mahapatra <jyot...@microsoft.com>
wrote:

> Hi Jacek,
>
>
>
> The issue might not be very widespread. I couldn’t reproduce it. Can you
> see if I am doing anything incorrect in the below queries?
>
>
>
> scala> spark.range(10).write.saveAsTable("t1")
>
>
>
> scala> spark.sql("describe formatted t1").show(100, false)
>
>
> +----------------------------+-----------------------------------------------------------------------------------+-------+
>
> |col_name
> |data_type
> |comment|
>
>
> +----------------------------+-----------------------------------------------------------------------------------+-------+
>
> |id
> |bigint
> |null   |
>
> |
>                    |
> |       |
>
> |# Detailed Table
> Information|
> |       |
>
> |Database
>          |default
> |       |
>
> |Table
> |t1
> |       |
>
> |Owner
> |jyotima
> |       |
>
> |Created Time                |Sun Sep 30 23:40:46 PDT
> 2018                                                       |       |
>
> |Last Access                 |Wed Dec 31 16:00:00 PST
> 1969                                                       |       |
>
> |Created By                  |Spark
> 2.3.2
> |       |
>
> |Type                        |MANAGED
>                                                                |       |
>
> |Provider
> |parquet
> |       |
>
> |Table Properties
> |[transient_lastDdlTime=1538376046]
> |       |
>
> |Statistics                  |3008
> bytes
> |       |
>
> |Location
> |file:/home/jyotima/repo/tmp/spark2.3.2/spark-2.3.2-bin-hadoop2.7/spark-warehouse/t1|
> |
>
> |Serde Library
> |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> |       |
>
> |InputFormat
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> |       |
>
> |OutputFormat
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> |       |
>
> |Storage Properties
> |[serialization.format=1]
>                        |       |
>
>
> +----------------------------+-----------------------------------------------------------------------------------+-------+
>
>
>
> scala> spark.version
>
> res4: String = 2.3.2
>
>
>
> Thanks,
>
> Jyoti
>
> *From:* Jacek Laskowski <ja...@japila.pl>
> *Sent:* Sunday, September 30, 2018 11:28 PM
> *To:* Sean Owen <sro...@gmail.com>
> *Cc:* dev <dev@spark.apache.org>
> *Subject:* Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works
> fine?
>
>
>
> Hi Sean,
>
>
>
> Thanks again for helping me to remain sane and that the issue is not
> imaginary :)
>
>
>
> I'd expect to be spark-warehouse in the directory where spark-shell is
> executed (which is what has always been used for the metastore).
>
>
>
> I'm reviewing all the changes between 2.3.1..2.3.2 to find anything
> relevant. I'm surprised nobody's reported it before. That worries me (or
> simply says that all the enterprise deployments simply use YARN with Hive?)
>
>
> Pozdrawiam,
>
> Jacek Laskowski
>
> ----
>
> https://about.me/JacekLaskowski
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fabout.me%2FJacekLaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428657573&sdata=9SJym%2B41JIxnZnRvtdBkGoV0DFl7YEBRK7ZTa1XsSMQ%3D&reserved=0>
>
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-spark-sql&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428667587&sdata=wewZO8MBXR9dM8zF1FGK%2FjXxlEOb%2FFqQc8LDKSBW66A%3D&reserved=0>
>
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fspark-structured-streaming&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428677578&sdata=TdX6tZltzBTn1vrB5N4ugqoshBD7qBks2Q1AW%2F%2Fq6ZQ%3D&reserved=0>
>
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-kafka-streams&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428677578&sdata=P8iFWrIRG%2FdRo1FZs19vRUUvhQ09SnQ84Gs6pdEfsZc%3D&reserved=0>
>
> Follow me at https://twitter.com/jaceklaskowski
>
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fjaceklaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428687587&sdata=BnkxI99p9W8mNIERwyWPbaK%2FPSL2wCrK964phr2Jj%2B8%3D&reserved=0>
>
>
>
>
>
> On Sun, Sep 30, 2018 at 10:25 PM Sean Owen <sro...@gmail.com> wrote:
>
> Hm, changes in the behavior of the default warehouse dir sound
> familiar, but anything I could find was resolved well before 2.3.1
> even. I don't know of a change here. What location are you expecting?
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12343289
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%3FprojectId%3D12315420%26version%3D12343289&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428687587&sdata=l7CmfR%2Fvyh%2BAiblQEfZS3bge94LI%2FFM8lkhoe90Tpnw%3D&reserved=0>
> On Sun, Sep 30, 2018 at 1:38 PM Jacek Laskowski <ja...@japila.pl> wrote:
> >
> > Hi Sean,
> >
> > I thought so too, but the path "file:/user/hive/warehouse/" should not
> have been used in the first place, should it? I'm running it in spark-shell
> 2.3.2. Why would there be any changes between 2.3.1 and 2.3.2 that I just
> downloaded and one worked fine while the other did not? I had to downgrade
> to 2.3.1 because of this (and do want to figure out why 2.3.2 behaves in a
> different way).
> >
> > The part of the stack trace is below.
> >
> > ➜  spark-2.3.2-bin-hadoop2.7 ./bin/spark-shell
> > 2018-09-30 17:43:49 WARN  NativeCodeLoader:62 - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> > Setting default log level to "WARN".
> > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
> > Spark context Web UI available at http://192.168.0.186:4040
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.0.186%3A4040&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428697597&sdata=cU088Lu1jh6hmEDqCIU8RQuEjd%2FBj94XMtXicOGJ8ig%3D&reserved=0>
> > Spark context available as 'sc' (master = local[*], app id =
> local-1538322235135).
> > Spark session available as 'spark'.
> > Welcome to
> >       ____              __
> >      / __/__  ___ _____/ /__
> >     _\ \/ _ \/ _ `/ __/  '_/
> >    /___/ .__/\_,_/_/ /_/\_\   version 2.3.2
> >       /_/
> >
> > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_171)
> > Type in expressions to have them evaluated.
> > Type :help for more information.
> >
> > scala> spark.version
> > res0: String = 2.3.2
> >
> > scala> spark.range(1).write.saveAsTable("demo")
> > 2018-09-30 17:44:27 WARN  ObjectStore:568 - Failed to get database
> global_temp, returning NoSuchObjectException
> > 2018-09-30 17:44:28 ERROR FileOutputCommitter:314 - Mkdirs failed to
> create file:/user/hive/warehouse/demo/_temporary/0
> > 2018-09-30 17:44:28 ERROR Utils:91 - Aborting task
> > java.io.IOException: Mkdirs failed to create
> file:/user/hive/warehouse/demo/_temporary/0/_temporary/attempt_20180930174428_0000_m_000007_0
> (exists=false, cwd=file:/Users/jacek/dev/apps/spark-2.3.2-bin-hadoop2.7)
> > at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)
> > at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
> > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
> > at
> org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:241)
> > at
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
> > at
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
> > at
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
> > at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
> > at
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
> > at
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
> > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> > at org.apache.spark.scheduler.Task.run(Task.scala:109)
> > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> >
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://about.me/JacekLaskowski
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fabout.me%2FJacekLaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428707602&sdata=O%2BMlDzwBM1wmQQtM8CDUx4j7yaOlU5sxES1Vub29lUc%3D&reserved=0>
> > Mastering Spark SQL https://bit.ly/mastering-spark-sql
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-spark-sql&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428717611&sdata=P7KNdK6ZMgDKWnGNC6UU8vbIg8QqC62walKHbWl98Ao%3D&reserved=0>
> > Spark Structured Streaming https://bit.ly/spark-structured-streaming
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fspark-structured-streaming&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428717611&sdata=qd0w7zXD670hfnmBuKoUUd83mkxkMxUgicNSTqDSBfo%3D&reserved=0>
> > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-kafka-streams&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428727616&sdata=fegxJX9f3KzKGFt0%2FcbQtLX1BtnQYYBGXstvb7b1agk%3D&reserved=0>
> > Follow me at https://twitter.com/jaceklaskowski
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fjaceklaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428737625&sdata=Eh4I86koKzbCMv90H%2FxTfTljYbRbjXZWUyfSCFR2bb0%3D&reserved=0>
> >
> >
> > On Sat, Sep 29, 2018 at 9:50 PM Sean Owen <sro...@gmail.com> wrote:
> >>
> >> Looks like a permission issue? Are you sure that isn't the difference,
> first?
> >>
> >> On Sat, Sep 29, 2018, 1:54 PM Jacek Laskowski <ja...@japila.pl> wrote:
> >>>
> >>> Hi,
> >>>
> >>> The following query fails in 2.3.2:
> >>>
> >>> scala> spark.range(10).write.saveAsTable("t1")
> >>> ...
> >>> 2018-09-29 20:48:06 ERROR FileOutputCommitter:314 - Mkdirs failed to
> create file:/user/hive/warehouse/bucketed/_temporary/0
> >>> 2018-09-29 20:48:07 ERROR Utils:91 - Aborting task
> >>> java.io.IOException: Mkdirs failed to create
> file:/user/hive/warehouse/bucketed/_temporary/0/_temporary/attempt_20180929204807_0000_m_000003_0
> (exists=false, cwd=file:/Users/jacek/dev/apps/spark-2.3.2-bin-hadoop2.7)
> >>> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)
> >>> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
> >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
> >>>
> >>> While it works fine in 2.3.1.
> >>>
> >>> Could anybody explain the change in behaviour in 2.3.2? The commit /
> the JIRA issue would be even nicer. Thanks.
> >>>
> >>> Pozdrawiam,
> >>> Jacek Laskowski
> >>> ----
> >>> https://about.me/JacekLaskowski
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fabout.me%2FJacekLaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428737625&sdata=YGfX2WeRaY0mpq94HoRyVs1R0zrD%2Fi9ufJhVOwRN8%2B8%3D&reserved=0>
> >>> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-spark-sql&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428747635&sdata=0i%2F%2FH9w1waMlvGwArLjPAHf1eoDgKzuHxuGLhv8Vcyc%3D&reserved=0>
> >>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fspark-structured-streaming&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428757639&sdata=LmF7VqzTRbenqYVPxbOSGBFHcnFxSVSrR4cwFUfuFo8%3D&reserved=0>
> >>> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbit.ly%2Fmastering-kafka-streams&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428767649&sdata=c9Jjm9%2B9oMxzJJQjSYmafmrdS6BQ%2B1eeHJqbtfCi4y0%3D&reserved=0>
> >>> Follow me at https://twitter.com/jaceklaskowski
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fjaceklaskowski&data=02%7C01%7Cjyotima%40microsoft.com%7C7d66eb6f8e7a44ffedbe08d627672c86%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636739721428767649&sdata=tiGbxDP%2BkEsryIV0uFTB7u2h3pgzPmUGtSl1RHmUmD8%3D&reserved=0>
>
>

Re: saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

Reply via email to