great its an easy fix. i will create jira and pullreq On Thu, Feb 2, 2017 at 2:13 PM, Michael Armbrust <mich...@databricks.com> wrote:
> That might be reasonable. At least I can't think of any problems with > doing that. > > On Thu, Feb 2, 2017 at 7:39 AM, Koert Kuipers <ko...@tresata.com> wrote: > >> since a dataset is a typed object you ideally don't have to think about >> field names. >> >> however there are operations on Dataset that require you to provide a >> Column, like for example joinWith (and joinWith returns a strongly typed >> Dataset, not DataFrame). once you have to provide a Column you are back to >> thinking in field names, and worrying about duplicate field names, which is >> something that can easily happen in a Dataset without you realizing it. >> >> so under the hood Dataset has unique identifiers for every column, as in >> dataset.queryExecution.logical.output, but these are expressions >> (attributes) that i cannot turn back into columns since the constructors >> for this are private in spark. >> >> so.... how about having Dataset.apply(i: Int): Column to allow me to pick >> columns by position without having to worry about (duplicate) field names? >> then i could do something like: >> >> dataset.joinWith(otherDataset, dataset(0) === otherDataset(0), joinType) >> > >