[ https://issues.apache.org/jira/browse/SPARK-26280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720000#comment-16720000 ]
Marco Gaido commented on SPARK-26280: ------------------------------------- I'd say this is most likely a duplicate of https://issues.apache.org/jira/browse/SPARK-25497. It would be great to test and confirm though. > Spark will read entire CSV file even when limit is used > ------------------------------------------------------- > > Key: SPARK-26280 > URL: https://issues.apache.org/jira/browse/SPARK-26280 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.1 > Reporter: Amir Bar-Or > Priority: Major > > When you read CSV as below , the parser still waste time and read the entire > file: > var lineDF1 = spark.read > .format("com.databricks.spark.csv") > .option("header", "true") //reading the headers > .option("mode", "DROPMALFORMED") > .option("delimiter",",") > .option("inferSchema", "false") > .schema(line_schema) > .load(i_lineitem) > .lineDF1.limit(10) > > Even though a LocalLimit is created , this does not stop the FileScan and > the parser from parsing entire file. Is it possible to push the limit down > and stop the parsing ? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org