The updates look great! Looks like many places are updated to the new APIs, but there still isn't a section for working with Datasets (most of the docs work with Dataframes). Are you planning on adding more? I am thinking something that would address common questions like the one I posted on the user email list earlier today.
Should I take discussion to your PR? Pedro On Fri, Jun 17, 2016 at 11:12 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > Hey Pedro, > > SQL programming guide is being updated. Here's the PR, but not merged yet: > https://github.com/apache/spark/pull/13592 > > Cheng > On 6/17/16 9:13 PM, Pedro Rodriguez wrote: > > Hi All, > > At my workplace we are starting to use Datasets in 1.6.1 and even more > with Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation > then the 2.0 documentation and it looks like not much time has been spent > writing a Dataset guide/tutorial. > > Preview Docs: > https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html#creating-datasets > Spark master docs: > https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md > > I would like to spend the time to contribute an improvement to those docs > with a more in depth examples of creating and using Datasets (eg using $ to > select columns). Is this of value, and if so what should my next step be to > get this going (create JIRA etc)? > > -- > Pedro Rodriguez > PhD Student in Distributed Machine Learning | CU Boulder > R&D Data Science Intern at Oracle Data Cloud > UC Berkeley AMPLab Alumni > > ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > > > -- Pedro Rodriguez PhD Student in Distributed Machine Learning | CU Boulder UC Berkeley AMPLab Alumni ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423 Github: github.com/EntilZha | LinkedIn: https://www.linkedin.com/in/pedrorodriguezscience