[ https://issues.apache.org/jira/browse/SPARK-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818648#comment-15818648 ]
Shuai Lin commented on SPARK-17101: ----------------------------------- Seems this issue has already been resolved by https://github.com/apache/spark/pull/14680 ? cc [~rxin] > Provide consistent format identifiers for TextFileFormat and ParquetFileFormat > ------------------------------------------------------------------------------ > > Key: SPARK-17101 > URL: https://issues.apache.org/jira/browse/SPARK-17101 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.0 > Reporter: Jacek Laskowski > Priority: Trivial > > Define the format identifier that is used in {{Optimized Logical Plan}} in > {{explain}} for {{text}} file format. > {code} > scala> spark.read.text("people.csv").cache.explain(extended = true) > ... > == Optimized Logical Plan == > InMemoryRelation [value#24], true, 10000, StorageLevel(disk, memory, > deserialized, 1 replicas) > +- *FileScan text [value#24] Batched: false, Format: > org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<value:string> > == Physical Plan == > InMemoryTableScan [value#24] > +- InMemoryRelation [value#24], true, 10000, StorageLevel(disk, memory, > deserialized, 1 replicas) > +- *FileScan text [value#24] Batched: false, Format: > org.apache.spark.sql.execution.datasources.text.TextFileFormat@262e2c8c, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<value:string> > {code} > When you {{explain}} csv format you can see {{Format: CSV}}. > {code} > scala> spark.read.csv("people.csv").cache.explain(extended = true) > == Parsed Logical Plan == > Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv > == Analyzed Logical Plan == > _c0: string, _c1: string, _c2: string, _c3: string > Relation[_c0#39,_c1#40,_c2#41,_c3#42] csv > == Optimized Logical Plan == > InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 10000, > StorageLevel(disk, memory, deserialized, 1 replicas) > +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, Format: > CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<_c0:string,_c1:string,_c2:string,_c3:string> > == Physical Plan == > InMemoryTableScan [_c0#39, _c1#40, _c2#41, _c3#42] > +- InMemoryRelation [_c0#39, _c1#40, _c2#41, _c3#42], true, 10000, > StorageLevel(disk, memory, deserialized, 1 replicas) > +- *FileScan csv [_c0#39,_c1#40,_c2#41,_c3#42] Batched: false, > Format: CSV, InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<_c0:string,_c1:string,_c2:string,_c3:string> > {code} > The custom format is defined for JSON, too. > {code} > scala> spark.read.json("people.csv").cache.explain(extended = true) > == Parsed Logical Plan == > Relation[_corrupt_record#93] json > == Analyzed Logical Plan == > _corrupt_record: string > Relation[_corrupt_record#93] json > == Optimized Logical Plan == > InMemoryRelation [_corrupt_record#93], true, 10000, StorageLevel(disk, > memory, deserialized, 1 replicas) > +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<_corrupt_record:string> > == Physical Plan == > InMemoryTableScan [_corrupt_record#93] > +- InMemoryRelation [_corrupt_record#93], true, 10000, StorageLevel(disk, > memory, deserialized, 1 replicas) > +- *FileScan json [_corrupt_record#93] Batched: false, Format: JSON, > InputPaths: file:/Users/jacek/dev/oss/spark/people.csv, PartitionFilters: [], > PushedFilters: [], ReadSchema: struct<_corrupt_record:string> > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org