[ https://issues.apache.org/jira/browse/SPARK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482533#comment-16482533 ]
Jami Malikzade commented on SPARK-24273: ---------------------------------------- [~kiszk] I was able to reproduce in small piece of code by filtering column , so 0 records returned as below, 'salary > 300' no record match, so it should checkpoint 0 records. so steps are: prepare in bucket salary.csv file like: |name,year,salary,month| |John,2017,12.33,4| |John,2018,55.114,5| |Smith,2017,36.339,3| |Smith,2018,45.36,6| And execute following spark : sc.setCheckpointDir("s3a://testbucket/tmp") val testschema = StructType(Array( StructField("name", StringType, true), StructField("year", IntegerType, true), StructField("salary",FloatType,true), StructField("month", IntegerType, true) )) val df = spark.read.option("header","true").option("sep",",").schema(testschema).csv("s3a://testbucket/salary.csv").filter('salary > 300).withColumn("month", when('name === "Smith", "6").otherwise("3")).checkpoint() df.show() It will throw : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 16 times, most recent failure: Lost task 0.15 in stage 11.0 (TID 41, 172.16.0.15, executor 5): com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: tx00000000000000018c023-005b02d1e7-513c871-default, AWS Error Code: InvalidRange, AWS Error Message: null, S3 Extended Request ID: 513c871-default-default > Failure while using .checkpoint method > -------------------------------------- > > Key: SPARK-24273 > URL: https://issues.apache.org/jira/browse/SPARK-24273 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Affects Versions: 2.3.0 > Reporter: Jami Malikzade > Priority: Major > > We are getting following error: > com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS > Service: Amazon S3, AWS Request ID: > tx000000000000000014126-005ae9bfd9-9ed9ac2-default, AWS Error Code: > InvalidRange, AWS Error Message: null, S3 Extended Request ID: > 9ed9ac2-default-default" > when we use checkpoint method as below. > val streamBucketDF = streamPacketDeltaDF > .filter('timeDelta > maxGap && 'timeDelta <= 30000) > .withColumn("bucket", when('timeDelta <= mediumGap, "medium") > .otherwise("large") > ) > .checkpoint() > Do you have idea how to prevent invalid range in header to be sent, or how it > can be workarounded or fixed? > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org