[jira] [Comment Edited] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

Cheng Lian (JIRA) Fri, 08 Jul 2016 00:30:07 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367345#comment-15367345
 ]


Cheng Lian edited comment on SPARK-16303 at 7/8/16 7:28 AM:
------------------------------------------------------------

Thanks for working on this! I'd suggest to send out the PR first so that people 
can comment on the PR. If you think it's still in WIP status, you may add a 
{{\[WIP\]}} tag in the PR title.

bq. ... I suggest having everything till the 'Data Sources' section in one 
single source file. ... I suggest using separate methods for each meaningful 
block. ...

Totally agree.

bq. ... As far as I read, it is impossible to overlap examples in the plugin 
that we use to extract code snippets from the source files.

Actually, I've added support for overlapped snippets in [PR 
#13972|https://github.com/apache/spark/pull/13972]. Please check [this PR 
comment|https://github.com/apache/spark/pull/13972#issuecomment-229543341] for 
more details. This is exactly motivated by the imports issue you mentioned.

bq. I noticed that the current java version is 1.7 in the parent pom. Is it 
possible to update the examples submodule to 1.8? I believe that lambdas will 
simplify the Java code and make it more readable.

I agree that using Java 8 features can be a lot easier for writing Java code. 
However, I believe we are still using Java 7 for the default Jenkins PR 
builder, thus example code in Java 8 may hit compilation errors unless you 
apply some Maven profile tricks. On the other hand, even we add Java 8 
examples, Java 7 examples are still necessary since Java 7 is still quite 
popular. I'd suggest to still use Java 7 for this ticket, and add Java 8 
examples as a follow-up.

bq. What is the correct way to load RDDs? There are different alternatives. For 
instance, via {{spark.sparkContext}}, or via DataFrames/Datasets. I assume that 
the first way makes more sense in section "Interoperating with RDDs" rather 
than creating DataFrames/Datasets, getting RDDs and then converting back.

Yea, the first makes more sense for the RDD interoperating section.

bq. Is it fine to re-use encoders?

Yes.

bq. If I use the {{getValuesMap\[T\]()}} method, then I will have a Dataset of 
{{Map\[String, T\]}} as a result. It seems that Maps are unsupported right now 
in Datasets.

We do support {{Dataset\[Map\[K, V\]\]}}, but there's no pre-defined implicit 
encoders in {{SQLImplicits}} because the number of permutations of all common 
key/value data types is too large. Users will have to define it explicitly, 
e.g.:

{code}
implicit val e1: Encoder[Map[Int, String]] = ExpressionEncoder()
implicit val e2: Encoder[Map[Long, Double]] = ExpressionEncoder()
{code}



was (Author: lian cheng):
Thanks for working on this! I'd suggest to send out the PR first so that people 
can comment on the PR. If you think it's still in WIP status, you may add a 
{{\[WIP\]}} tag in the PR title.

bq. ... I suggest having everything till the 'Data Sources' section in one 
single source file. ... I suggest using separate methods for each meaningful 
block. ...

Totally agree.

bq. ... As far as I read, it is impossible to overlap examples in the plugin 
that we use to extract code snippets from the source files.

Actually, I've added support for overlapped snippets in [PR 
#13972|https://github.com/apache/spark/pull/13972]. Please check [this PR 
comment|https://github.com/apache/spark/pull/13972#issuecomment-229543341] for 
more details. This is exactly motivated by the imports issue you mentioned.

bq. I noticed that the current java version is 1.7 in the parent pom. Is it 
possible to update the examples submodule to 1.8? I believe that lambdas will 
simplify the Java code and make it more readable.

I agree that using Java 8 features can be a lot easier for writing Java code. 
However, I believe we are still using Java 7 for the default Jenkins PR 
builder, thus example code in Java 8 may hit compilation errors unless you 
apply some Maven profile tricks. On the other hand, even we add Java 8 
examples, Java 7 examples are still necessary since Java 7 is still quite 
popular.

bq. What is the correct way to load RDDs? There are different alternatives. For 
instance, via {{spark.sparkContext}}, or via DataFrames/Datasets. I assume that 
the first way makes more sense in section "Interoperating with RDDs" rather 
than creating DataFrames/Datasets, getting RDDs and then converting back.

Yea, the first makes more sense for the RDD interoperating section.

bq. Is it fine to re-use encoders?

Yes.

bq. If I use the {{getValuesMap\[T\]()}} method, then I will have a Dataset of 
{{Map\[String, T\]}} as a result. It seems that Maps are unsupported right now 
in Datasets.

We do support {{Dataset\[Map\[K, V\]\]}}, but there's no pre-defined implicit 
encoders in {{SQLImplicits}} because the number of permutations of all common 
key/value data types is too large. Users will have to define it explicitly, 
e.g.:

{code}
implicit val e1: Encoder[Map[Int, String]] = ExpressionEncoder()
implicit val e2: Encoder[Map[Long, Double]] = ExpressionEncoder()
{code}


> Update SQL examples and programming guide for Scala and Java language bindings
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-16303
>                 URL: https://issues.apache.org/jira/browse/SPARK-16303
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Documentation, Examples
>    Affects Versions: 2.0.0
>            Reporter: Cheng Lian
>            Assignee: Anton Okolnychyi
>
> We need to update SQL examples code under the {{examples}} sub-project, and 
> then replace hard-coded snippets in the SQL programming guide with snippets 
> automatically extracted from actual source files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16303) Update SQL examples and programming guide for Scala and Java language bindings

Reply via email to