[jira] [Commented] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

Anton Okolnychyi (JIRA) Thu, 07 Jul 2016 15:48:40 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366905#comment-15366905
 ]


Anton Okolnychyi commented on SPARK-16303:
------------------------------------------

[~lian cheng] 
I would like to share some updates and get initial feedback on some points.

The code can be found here:
https://github.com/apache/spark/compare/master...aokolnychyi:sql_guide_update_prototype

1. On one hand, it would be nice to split the whole SQL guide into several 
source files. On the other hand, too many source files are not good either. 
According to what I have seen in the aforementioned prototype and my own point 
of view, I suggest having everything till the 'Data Sources' section in one 
single source file. However, it is still nice to have some modularity within 
this source file. For that reason, I suggest using separate methods for each 
meaningful block. This will help to avoid naming problems and will make 
navigation within the file easier. This idea is already present in the code 
that I shared.

2. Imports. First of all, is the same style used for Java imports as for Scala 
ones? I also noticed that imports are present in some examples. If we have 
source files that will cover several code snippets, then we will have a 
problem. As far as I read, it is impossible to overlap examples in the plugin 
that we use to extract code snippets from the source files.

3. I noticed that the current java version is 1.7 in the parent pom. Is it 
possible to update the examples submodule to 1.8? I believe that lambdas will 
simplify the Java code and make it more readable.  

4. What is the correct way to load RDDs? There are different alternatives. For 
instance, via spark.sparkContext, or via DataFrames/Datasets. I assume that the 
first way makes more sense in section 'Interoperating with RDDs' rather than 
creating DataFrames/Datasets, getting RDDs and then converting back. 

5. Is it fine to re-use encoders?

6. If I use the getValuesMap\[T\]() method, then I will have a Dataset of 
Map\[String, T\] as a result. It seems that Maps are unsupported right now in 
Datasets.

> Update SQL examples and programming guide for Scala and Java language bindings
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-16303
>                 URL: https://issues.apache.org/jira/browse/SPARK-16303
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation, Examples
>    Affects Versions: 2.0.0
>            Reporter: Cheng Lian
>            Assignee: Anton Okolnychyi
>
> We need to update SQL examples code under the {{examples}} sub-project, and 
> then replace hard-coded snippets in the SQL programming guide with snippets 
> automatically extracted from actual source files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

Reply via email to