[ 
https://issues.apache.org/jira/browse/SPARK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482533#comment-16482533
 ] 

Jami Malikzade commented on SPARK-24273:
----------------------------------------

[~kiszk]

 

I was able to reproduce in small piece of code by filtering column , so 0 
records returned as below, 'salary > 300' no record match, so it should 
checkpoint 0 records.

so steps are:

prepare in bucket salary.csv file like:
|name,year,salary,month|
|John,2017,12.33,4|
|John,2018,55.114,5|
|Smith,2017,36.339,3|
|Smith,2018,45.36,6|

 

And  execute following spark :

sc.setCheckpointDir("s3a://testbucket/tmp")
val testschema = StructType(Array(
 StructField("name", StringType, true),
 StructField("year", IntegerType, true),
 StructField("salary",FloatType,true),
 StructField("month", IntegerType, true)
 ))


val df = 
spark.read.option("header","true").option("sep",",").schema(testschema).csv("s3a://testbucket/salary.csv").filter('salary
 > 300).withColumn("month", when('name === "Smith", 
"6").otherwise("3")).checkpoint()
df.show()

 

 

It will throw :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 11.0 failed 16 times, most recent failure: Lost task 0.15 in stage 11.0 
(TID 41, 172.16.0.15, executor 5): 
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
Service: Amazon S3, AWS Request ID: 
tx00000000000000018c023-005b02d1e7-513c871-default, AWS Error Code: 
InvalidRange, AWS Error Message: null, S3 Extended Request ID: 
513c871-default-default

> Failure while using .checkpoint method
> --------------------------------------
>
>                 Key: SPARK-24273
>                 URL: https://issues.apache.org/jira/browse/SPARK-24273
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>    Affects Versions: 2.3.0
>            Reporter: Jami Malikzade
>            Priority: Major
>
> We are getting following error:
> com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
> Service: Amazon S3, AWS Request ID: 
> tx000000000000000014126-005ae9bfd9-9ed9ac2-default, AWS Error Code: 
> InvalidRange, AWS Error Message: null, S3 Extended Request ID: 
> 9ed9ac2-default-default"
> when we use checkpoint method as below.
> val streamBucketDF = streamPacketDeltaDF
>  .filter('timeDelta > maxGap && 'timeDelta <= 30000)
>  .withColumn("bucket", when('timeDelta <= mediumGap, "medium")
>  .otherwise("large")
>  )
>  .checkpoint()
> Do you have idea how to prevent invalid range in header to be sent, or how it 
> can be workarounded or fixed?
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to