dongjoon-hyun commented on PR #1722:
URL: https://github.com/apache/orc/pull/1722#issuecomment-1877809293
To @cxzl25 , Apache Spark has three ORC code path.
- `sql` module's `OrcFileFormat`:
-
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
- `hive` module's `OrcFileFormat`:
-
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
- `hive` module's `HiveFileFormat`:
-
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala
It seems that you are using `HiveFileFormat`. It follows Hive's behavior
whose file extension is determined by `hive.output.file.extension`. Please try
the following from the beginning.
```
$ bin/spark-sql -c hive.output.file.extension=orc
...
spark-sql (default)> create table t stored as orc as select id from range(1);
Time taken: 1.706 seconds
$ ls -al spark-warehouse/t
total 40
drwxr-xr-x 8 dongjoon staff 256 Jan 4 13:50 .
drwxr-xr-x 3 dongjoon staff 96 Jan 4 13:50 ..
-rw-r--r-- 1 dongjoon staff 8 Jan 4 13:50 ._SUCCESS.crc
-rw-r--r-- 1 dongjoon staff 12 Jan 4 13:50
.part-00000-492978c5-6833-4dd3-a417-eba7a49b9d44-c000.snappy.orc.crc
-rw-r--r-- 1 dongjoon staff 12 Jan 4 13:50
.part-00009-492978c5-6833-4dd3-a417-eba7a49b9d44-c000.snappy.orc.crc
-rw-r--r-- 1 dongjoon staff 0 Jan 4 13:50 _SUCCESS
-rw-r--r-- 1 dongjoon staff 116 Jan 4 13:50
part-00000-492978c5-6833-4dd3-a417-eba7a49b9d44-c000.snappy.orc
-rw-r--r-- 1 dongjoon staff 237 Jan 4 13:50
part-00009-492978c5-6833-4dd3-a417-eba7a49b9d44-c000.snappy.orc
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]