[ https://issues.apache.org/jira/browse/SPARK-31463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091205#comment-17091205 ]
Steven Moy commented on SPARK-31463: ------------------------------------ Hi [~hyukjin.kwon], What's Spark recommended path on introducing C code? I was following SQLite and DuckDB, their approach is to inline the dependency (bring the code in in the case of compatible license). Or would it better to support simdjson as a compeletely separate DataSourcev2 implementation? simdjson license is Apache License as well; [https://github.com/simdjson/simdjson/blob/master/LICENSE] > Enhance JsonDataSource by replacing jackson with simdjson > --------------------------------------------------------- > > Key: SPARK-31463 > URL: https://issues.apache.org/jira/browse/SPARK-31463 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Steven Moy > Priority: Minor > > I came across this VLDB paper: [https://arxiv.org/pdf/1902.08318.pdf] on how > to improve json reading speed. We use Spark to process terabytes of JSON, so > we try to find ways to improve JSON parsing speed. > > [https://lemire.me/blog/2020/03/31/we-released-simdjson-0-3-the-fastest-json-parser-in-the-world-is-even-better/] > > [https://github.com/simdjson/simdjson/issues/93] > > Anyone on the opensource communty interested in leading this effort to > integrate simdjson in spark json data source api? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org