[ https://issues.apache.org/jira/browse/SPARK-36024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388969#comment-17388969 ]
Steve Loughran commented on SPARK-36024: ---------------------------------------- yes, you can change the example. For hadoop we're trying to keep the landsat file around, because removing it breaks regression testing all old releases. > Switch the datasource example due to the depreciation of the dataset > -------------------------------------------------------------------- > > Key: SPARK-36024 > URL: https://issues.apache.org/jira/browse/SPARK-36024 > Project: Spark > Issue Type: Documentation > Components: Documentation > Affects Versions: 3.1.2 > Reporter: Leona Yoda > Priority: Trivial > > The S3 bucket that used for an example in "Integration with Cloud > Infrastructures" document will be deleted on Jul 1, 2021 > [https://registry.opendata.aws/landsat-8/ > |https://registry.opendata.aws/landsat-8/] > The dataset will move to another bucket but it requires `--request-payer > requester` option so users have to pay S3 cost. > [https://registry.opendata.aws/usgs-landsat/] > > So I think it's better to change the datasource like this. > [https://github.com/yoda-mon/spark/commit/cdb24acdbb57a429e5bf1729502653b91a600022] > > I chose [NYC Taxi data| > [https://registry.opendata.aws/nyc-tlc-trip-records-pds/|https://registry.opendata.aws/nyc-tlc-trip-records-pds/),]] > here for an example. > Unlike landat data it's not compressed, but it is just an example and there > are several tutorials using Spark (e.g. > [https://github.com/aws-samples/amazon-eks-apache-spark-etl-sample)] > > Reed test result > {code:java} > scala> sc.textFile("s3a://nyc-tlc/misc/taxi > _zone_lookup.csv").take(10).foreach(println) > "LocationID","Borough","Zone","service_zone" 1,"EWR","Newark Airport","EWR" > 2,"Queens","Jamaica Bay","Boro Zone" 3,"Bronx","Allerton/Pelham > Gardens","Boro Zone" 4,"Manhattan","Alphabet City","Yellow Zone" 5,"Staten > Island","Arden Heights","Boro Zone" 6,"Staten Island","Arrochar/Fort > Wadsworth","Boro Zone" 7,"Queens","Astoria","Boro Zone" 8,"Queens","Astoria > Park","Boro Zone" 9,"Queens","Auburndale","Boro Zone" > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org