Re: preferredlocations for hadoopfsrelations based baseRelations

2020-06-29 Thread Steve Loughran
Here's a class which lets you proved a function on a row by row basis to declare location https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala needs to be in o.a.spark as something

Re: preferredlocations for hadoopfsrelations based baseRelations

2020-06-04 Thread ZHANG Wei
AFAICT, `FileScanRDD` invokes`FilePartition::preferredLocations()` method, which is ordered by the data size, to get the partition preferred locations. If there are other vectors to sort, I'm wondering if here[1] can be a place to add. Or inheriting class `FilePartition` with overridden

preferredlocations for hadoopfsrelations based baseRelations

2020-06-04 Thread Nasrulla Khan Haris
HI Spark developers, I have created new format extending fileformat. I see getPrefferedLocations is available if newCustomRDD is created. Since fileformat is based off FileScanRDD which uses readfile method to read partitioned file, Is there a way to add desired preferredLocations ?