Re: [discuss][data source v2] remove type parameter in DataReader/WriterFactory

2018-04-16 Thread Felix Cheung
Is it required for DataReader to support all known DataFormat? Hopefully, not, as assumed by the 'throw' in the interface. Then specifically how are we going to express capability of the given reader of its supported format(s), or specific support for each of "real-time data in row format, and

Re: Maintenance releases for SPARK-23852?

2018-04-16 Thread Xiao Li
Yes, it sounds good to me. We can upgrade both Parquet 1.8.2 to 1.8.3 and ORC 1.4.1 to 1.4.3 in our upcoming Spark 2.3.1 release. Thanks for your efforts! @Henry and @Dongjoon Xiao 2018-04-16 14:41 GMT-07:00 Henry Robinson : > Seems like there aren't any objections. I'll pick

Re: Maintenance releases for SPARK-23852?

2018-04-16 Thread Henry Robinson
Seems like there aren't any objections. I'll pick this thread back up when a Parquet maintenance release has happened. Henry On 11 April 2018 at 14:00, Dongjoon Hyun wrote: > Great. > > If we can upgrade the parquet dependency from 1.8.2 to 1.8.3 in Apache > Spark

Re: Isolate 1 partition and perform computations

2018-04-16 Thread Thodoris Zois
Hello, Thank you very much for your response Anastasie! Today I think I made it through dropping partitions in (runJob or submitJob) - I don’t remember exactly, in DAGScheduler. If it doesn’t work properly after some tests, I will follow your approach. Thank you, Thodoris > On 16 Apr 2018,

Re: Isolate 1 partition and perform computations

2018-04-16 Thread Anastasios Zouzias
Hi all, I think this is doable using the mapPartitionsWithIndex method of RDD. Example: val partitionIndex = 0 // Your favorite partition index here val rdd = spark.sparkContext.parallelize(Array.range(0, 1000)) // Replace elements of partitionIndex with [-10, .. ,0] val fixed =