The reader necessarily knows the number of partitions, since it's responsible for generating its output partitions in the first place. I won't speak for everyone, but it would make sense to me to pass in a Partitioning instance to the writer, since it's already part of the v2 interface through the reader's SupportsReportPartitioning.
I don't think we can expose execution plans to the data source v2 interface; the exact Java structure of execution plans isn't stable across even maintenance releases. Even if we could, I don't really see what the use case would be - what information does the writer need that can't be made available through either the input data or the input partitioning? (The built-in Kafka sink, for example, handles metadata such as topic switching by just accepting topic name as a column along with the data.) On Wed, Mar 13, 2019 at 1:39 AM JOAQUIN GUANTER GONZALBEZ < joaquin.guantergonzal...@telefonica.com> wrote: > I'd like to bump this. I agree with Carlos that there is very little > information at the DataSoruceWrite/DataSourceReader level. To me, ideally, > the DataSourceWriter/Reader should have as much information as possible. > Not only the number of partitions, but also ideally the whole execution > plan. > > This would not only enable things like automatic creation of kafka topics > with the correct number of partitions (like Carlos mentioned), but it would > also allow advanced DataSources that, for example, analyze the execution > plan to choose the correct parameters to implement differential privacy. > > CC'ing in Ryan, since he is leading the DataSourceV2 workgroup (sorry I > can't joint the sync meetings, but I'm in CET time and the time logictics > of that meeting don't work for Europe). > > Ryan, do you think it would be a good idea to provide extra information at > the DataSourceWriter/Reader level to enable more advanced datasources? > Would a PR contribution with these changed be a welcome addition? > > Thanks, > Ximo > > -----Mensaje original----- > De: CARLOS DEL PRADO MOTA <carlos.delpradom...@telefonica.com> > Enviado el: jueves, 7 de marzo de 2019 10:19 > Para: dev@spark.apache.org > Asunto: Partitions at DataSource API V2 > > Hello, I’m Carlos del Prado, developer at Telefonica. > > We are working with Spark's DataSource API V2 building a custom Kafka > connector that creates the topic upon write. In order to do that, we need > to know the number of partitions before writing data in each partition, at > the DataSourceWriter level. > > Is there any way for us do that? > > King regards, > Carlos. > > ________________________________ > > Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, > puede contener información privilegiada o confidencial y es para uso > exclusivo de la persona o entidad de destino. Si no es usted. el > destinatario indicado, queda notificado de que la lectura, utilización, > divulgación y/o copia sin autorización puede estar prohibida en virtud de > la legislación vigente. Si ha recibido este mensaje por error, le rogamos > que nos lo comunique inmediatamente por esta misma vía y proceda a su > destrucción. > > The information contained in this transmission is privileged and > confidential information intended only for the use of the individual or > entity named above. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this communication is strictly prohibited. If you have received > this transmission in error, do not read it. Please immediately reply to the > sender that you have received this communication in error and then delete > it. > > Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, > pode conter informação privilegiada ou confidencial e é para uso exclusivo > da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário > indicado, fica notificado de que a leitura, utilização, divulgação e/ou > cópia sem autorização pode estar proibida em virtude da legislação vigente. > Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique > imediatamente por esta mesma via e proceda a sua destruição >