[ https://issues.apache.org/jira/browse/SPARK-26280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719727#comment-16719727 ]
Hyukjin Kwon commented on SPARK-26280: -------------------------------------- Is it CSV specific or does it happen in other datasources? > Spark will read entire CSV file even when limit is used > ------------------------------------------------------- > > Key: SPARK-26280 > URL: https://issues.apache.org/jira/browse/SPARK-26280 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.1 > Reporter: Amir Bar-Or > Priority: Major > > When you read CSV as below , the parser still waste time and read the entire > file: > var lineDF1 = spark.read > .format("com.databricks.spark.csv") > .option("header", "true") //reading the headers > .option("mode", "DROPMALFORMED") > .option("delimiter",",") > .option("inferSchema", "false") > .schema(line_schema) > .load(i_lineitem) > .lineDF1.limit(10) > > Even though a LocalLimit is created , this does not stop the FileScan and > the parser from parsing entire file. Is it possible to push the limit down > and stop the parsing ? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org