[jira] [Commented] (SPARK-26146) CSV wouln't be ingested in Spark 2.4.0 with Scala 2.12

Sean Owen (JIRA) Sat, 02 Mar 2019 13:04:48 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782526#comment-16782526
 ]


Sean Owen commented on SPARK-26146:
-----------------------------------

It's possible, even likely, that the dependency is transitive. In this case:

{code}
[INFO] net.jgp.books:spark-chapter01:jar:1.0.0-SNAPSHOT
[INFO] +- org.apache.spark:spark-core_2.11:jar:2.4.0:compile
[INFO] |  +- org.apache.avro:avro:jar:1.8.2:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.7:compile
{code}

Here's what I'm confused about: Spark has depended on paranamer 2.8 since 2.3.0:
https://github.com/apache/spark/blob/v2.4.0/pom.xml#L185
{code}
[INFO] +- org.apache.spark:spark-core_2.12:jar:3.0.0-SNAPSHOT:compile
[INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.8:runtime
[INFO] |  +- org.apache.avro:avro:jar:1.8.2:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar
{code}

My only guess right now is that this is due to the arcance scoping rules for 
transitive dependencies in Maven. paranamer is a runtime dependency. Here, 
Spark is given as a compile-time dependency. It should be 'provided'. That 
doesn't change what Maven reports (weird still) but might solve the problem. 
Spark itself really does have 2.8. You can see it in 
https://search.maven.org/artifact/org.apache.spark/spark-parent_2.12/2.4.0/pom

I think this is still probably a weird issue between the code here and Maven.

> CSV wouln't be ingested in Spark 2.4.0 with Scala 2.12
> ------------------------------------------------------
>
>                 Key: SPARK-26146
>                 URL: https://issues.apache.org/jira/browse/SPARK-26146
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.4.0
>            Reporter: Jean Georges Perrin
>            Priority: Major
>
> Ingestion of a CSV file seems to fail with Spark v2.4.0 and Scala v2.12, 
> where it works ok with Scala v2.11.
> When running a simple CSV ingestion like:{{ }}
> {code:java}
>     // Creates a session on a local master
>     SparkSession spark = SparkSession.builder()
>         .appName("CSV to Dataset")
>         .master("local")
>         .getOrCreate();
>     // Reads a CSV file with header, called books.csv, stores it in a 
> dataframe
>     Dataset<Row> df = spark.read().format("csv")
>         .option("header", "true")
>         .load("data/books.csv");
> {code}
>   With Scala 2.12, I get: 
> {code:java}
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10582
> at 
> com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
> at 
> com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
> at 
> com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
> at 
> com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:90)
> at 
> com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
> at 
> com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
> at 
> com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
> at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:240)
> ...
> at 
> net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.start(CsvToDataframeApp.java:37)
> at 
> net.jgp.books.sparkWithJava.ch01.CsvToDataframeApp.main(CsvToDataframeApp.java:21)
> {code}
> Where it works pretty smoothly if I switch back to 2.11.
> Full example available at 
> [https://github.com/jgperrin/net.jgp.books.sparkWithJava.ch01.] You can 
> modify pom.xml to change easily the Scala version in the property section:
> {code:java}
> <properties>
>  <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>  <java.version>1.8</java.version>
>  <scala.version>2.11</scala.version>
>  <spark.version>2.4.0</spark.version>
> </properties>{code}
>  
> (ps. It's my first bug submission, so I hope I did not mess too much, be 
> tolerant if I did)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26146) CSV wouln't be ingested in Spark 2.4.0 with Scala 2.12

Reply via email to