[ https://issues.apache.org/jira/browse/SPARK-20590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303237#comment-17303237 ]
Yu Xiang commented on SPARK-20590: ---------------------------------- [~cloud_fan], I tried to use the full name as, it does not work. Any idea? (more detailed explanation of the problem is here: https://stackoverflow.com/questions/66664181/spark-multiple-sources-found-for-text) {code:java} DataFrameReader read = spark.read(); JavaRDD<String> stringJavaRDD = read.format("org.apache.spark.sql.execution.datasources.text.TextFileFormat").textFile(inputPath).javaRDD(); {code} > Map default input data source formats to inlined classes > -------------------------------------------------------- > > Key: SPARK-20590 > URL: https://issues.apache.org/jira/browse/SPARK-20590 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Sameer Agarwal > Assignee: Hyukjin Kwon > Priority: Major > Fix For: 2.2.0 > > > One of the common usability problems around reading data in spark > (particularly CSV) is that there can often be a conflict between different > readers in the classpath. > As an example, if someone launches a 2.x spark shell with the spark-csv > package in the classpath, Spark currently fails in an extremely unfriendly way > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > java.lang.RuntimeException: Multiple sources found for csv > (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, > com.databricks.spark.csv.DefaultSource15), please specify the fully qualified > class name. > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:574) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:295) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533) > at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412) > ... 48 elided > {code} > This JIRA proposes a simple way of fixing this error by always mapping > default input data source formats to inlined classes (that exist in Spark). > {code} > ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0 > scala> val df = spark.read.csv("/foo/bar.csv") > df: org.apache.spark.sql.DataFrame = [_c0: string] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org