Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20929#discussion_r186584474 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/TypePlaceholder.scala --- @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.types + +/** + * An internal type that is a not yet available and will be replaced by an actual type later. + */ +case object TypePlaceholder extends StringType --- End diff -- In the first attempt, I used the new type instead of `NullType` because some `Sink`s (`FileStreamSink`) could not handle `NullType`; ``` // parquet java.lang.RuntimeException: Unsupported data type NullType. at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$makeWriter(ParquetWriteSupport.scala:206) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) // orc java.lang.IllegalArgumentException: Can't parse category at 'struct<c0:bigint,c1:null^,c2:array<null>>' at org.apache.orc.TypeDescription.parseCategory(TypeDescription.java:223) at org.apache.orc.TypeDescription.parseType(TypeDescription.java:332) at org.apache.orc.TypeDescription.parseStruct(TypeDescription.java:327) at org.apache.orc.TypeDescription.parseType(TypeDescription.java:385) at org.apache.orc.TypeDescription.fromString(TypeDescription.java:406) // csv java.lang.UnsupportedOperationException: CSV data source does not support null data type. at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:130) at org.apache.spark.sql.execution.datasources.csv.CSVUtils$$anonfun$verifySchema$1.apply(CSVUtils.scala:134) at org.apache.spark.sql.execution.datasources.csv.CSVUtils$$anonfun$verifySchema$1.apply(CSVUtils.scala:134) at scala.collection.Iterator$class.foreach(Iterator.scala:893) ``` So, in the previous fix, I tried to add `PlaceholderType` inherited from `StringType` and this type could be correctly handled in all the `Sink`, but too tricky. In the suggested, `NullType, ArrayType(NullType), etc should be dropped` means that we need to handle an inferred schema as follows? e.g., ``` Inferred schema: "StructType<IntegerType, NullType, ArrayType(NullType)>" -> Schema used in FileStreamSource: "StructType<IntegerType>" ``` Is this right?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org