[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929272#comment-15929272 ] Hyukjin Kwon commented on SPARK-18789: -- Do you mind if I ask a simple code for this? Pseudocode is fine. (I am just trying to verify this) > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at >
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929259#comment-15929259 ] Harish commented on SPARK-18789: When you create the DF (dynamic) withough knowing the type of the column then you cant define the schema. In my case i am not knowing the type of a column. When you dont define the column type and if the entire column in None then i am getting this error message. i hope i am clear. > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at >
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929084#comment-15929084 ] Hyukjin Kwon commented on SPARK-18789: -- It seems it goes failed in schema inference. {code} >>> data = [ ... ["a", 1, None], ... ["b", 1, None], ... ["c", 1, None], ... ["d", 1, None], ... ] >>> df = spark.createDataFrame(data) Traceback (most recent call last): File "", line 1, in File ".../spark/python/pyspark/sql/session.py", line 526, in createDataFrame rdd, schema = self._createFromLocal(map(prepare, data), schema) File ".../spark/python/pyspark/sql/session.py", line 390, in _createFromLocal struct = self._inferSchemaFromList(data) File ".../spark/python/pyspark/sql/session.py", line 324, in _inferSchemaFromList raise ValueError("Some of types cannot be determined after inferring") ValueError: Some of types cannot be determined after inferring {code} that's why I specified the schema. Did I maybe misunderstand your comment? > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at >
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928926#comment-15928926 ] Harish commented on SPARK-18789: In your example you are defining the schema first and then loading the data. Which works. Try to create the DF without defining the schema (column type). > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928780#comment-15928780 ] Hyukjin Kwon commented on SPARK-18789: -- Hm, do you mind if I ask a reproducer? {code} from pyspark.sql import Row from pyspark.sql.types import * data = [ ["a", 1, None], ["b", 1, None], ["c", 1, None], ["d", 1, None], ] schema = StructType([ StructField("col1", StringType(), True), StructField("col2", IntegerType(), True), StructField("col3", StringType(), True)]) df = spark.createDataFrame(data, schema) df.write.format("orc").save("hdfs://localhost:9000/tmp/squares", mode='overwrite') spark.read.orc("hdfs://localhost:9000/tmp/squares").show() {code} produces {code} ++++ |col1|col2|col3| ++++ | a| 1|null| | b| 1|null| | c| 1|null| | d| 1|null| ++++ {code} This seems working fine. > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at >
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928738#comment-15928738 ] Hyukjin Kwon commented on SPARK-18789: -- Doh, I am sorry. Let me try to test again and will open. I thought the script describes how to reproduce. Thanks for pointing this out. Let me try this in the current master soon. > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at >
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928213#comment-15928213 ] Eugen Prokhorenko commented on SPARK-18789: --- Just wanted to mention that initial problem involves saving null values (the python script above doesn't have null columns in the df). > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at >
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897791#comment-15897791 ] Eugen Prokhorenko commented on SPARK-18789: --- I've added a python script (instead of a Scala app that I started with), in which a dataframe is saved in ORC format onto HDFS: https://github.com/eugenzyx/SparkOrc/blob/master/app.py Currently, I need to come up with an idea of how I can get null values in the first place. Simply supplying any column with `None` shows an error saying that a type of a column cannot be determined. BTW, Scala saves nulls without any complaints: https://github.com/eugenzyx/SparkOrc/blob/master/src/main/scala/xyz/eugenzyx/App.scala#L18 It needs to be looked into more thoroughly later. > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at >
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829906#comment-15829906 ] Eugen Prokhorenko commented on SPARK-18789: --- Hello, I would want to investigate this issue and, in case if I succeeded to come closer to the root of the issue, to provide with any results I got. Currently, I'm just trying to write a data frame onto HDFS. When I'm sure that it works I will proceed with saving the data frame with the problematic schema. Here's a repository, in which I'm going to do the investigation: https://github.com/eugenzyx/SparkOrc (Spark 2.0.2, Scala 2.11.8) > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at >