[GitHub] spark pull request #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceRe...

maropu Sun, 24 Jun 2018 21:16:10 -0700

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21625#discussion_r197676056
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
 ---
    @@ -573,32 +578,6 @@ object DataSourceReadBenchmark {
               }
             }
     
    -        /*
    -        Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
    -        Partitioned Table:                   Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
    -        
--------------------------------------------------------------------------------------------
    --- End diff --
    
    oh, I hit the bug in csv parsing when updating this benchmark...
    ```
    scala> val dir = "/tmp/spark-csv/csv"
    scala> spark.range(10).selectExpr("id % 2 AS p", 
"id").write.mode("overwrite").partitionBy("p").csv(dir)
    scala> spark.read.csv(dir).selectExpr("sum(p)").collect()
    18/06/25 13:12:51 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 5)
    java.lang.NullPointerException
            at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert(UnivocityParser.scala:197)
  
            at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:190)
            at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:309)
            at 
org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:309)
            at 
org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:61)
            ...
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceRe...

Reply via email to