[ 
https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333725#comment-16333725
 ] 

Felix Cheung edited comment on SPARK-23114 at 1/21/18 11:02 PM:
----------------------------------------------------------------

[~sameerag]

Here are some ideas for the release notes (that goes to spark-website in the 
announcements)

For SparkR, new in 2.3.0:

SQL changes:

SQL functions, cubing & nested structure

collect_list, collect_set, split_string, repeat_string, rollup, cube
 explode_outer posexplode_outer, %<=>%, !, not, create_array, create_map, 
grouping_bit, grouping_id
 input_file_name, alias, trunc, date_trunc, map_keys, map_values, current_date, 
current_timestamp, trim/trimString,
 dayofweek, unionByName,

to_json (map or array of maps)

Data Source -  multiLine (json/csv)

 

ML changes:

Decision Tree (regression and classification)

Constrained Logistic Regression
 offset in SparkR GLM [https://github.com/apache/spark/pull/18831]
 stringIndexerOrderType
 handleInvalid (spark.svmLinear, spark.logit, spark.mlp, spark.naiveBayes, 
spark.gbt, spark.decisionTree, spark.randomForest)

 

SS changes:

Structured Streaming API for withWatermark, trigger (once, processingTime), 
partitionBy

stream-stream join

 

Documentation:

major overhaul and simplification of API doc for SQL functions

 


was (Author: felixcheung):
[~sameerag]

Here are some ideas for the release notes (that goes to spark-website in the 
announcements)

For SparkR, new in 2.3.0:

SQL changes:

SQL functions, cubing & nested structure

collect_list, collect_set, split_string, repeat_string, rollup, cube
 explode_outer posexplode_outer, %<=>%, !, not, create_array, create_map, 
grouping_bit, grouping_id
 input_file_name, alias, trunc, date_trunc, map_keys, map_values, current_date, 
current_timestamp, trim/trimString,
 dayofweek, unionByName,

to_json (map or array of maps)

Data Source -  multiLine (json/csv)

 

ML changes:

Decision Tree (regression and classification)

Constrained Logistic Regression
offset in SparkR GLM https://github.com/apache/spark/pull/18831
stringIndexerOrderType
handleInvalid (spark.svmLinear, spark.logit, spark.mlp, spark.naiveBayes, 
spark.gbt, spark.decisionTree, spark.randomForest)

 

SS changes:

Structured Streaming API for withWatermark, trigger (once, processingTime), 
partitionBy

stream-stream join

 

Documentation:

major overhaul and simplification of API doc

 

> Spark R 2.3 QA umbrella
> -----------------------
>
>                 Key: SPARK-23114
>                 URL: https://issues.apache.org/jira/browse/SPARK-23114
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Documentation, SparkR
>            Reporter: Joseph K. Bradley
>            Assignee: Felix Cheung
>            Priority: Critical
>
> This JIRA lists tasks for the next Spark release's QA period for SparkR.
> The list below gives an overview of what is involved, and the corresponding 
> JIRA issues are linked below that.
> h2. API
> * Audit new public APIs (from the generated html doc)
> ** relative to Spark Scala/Java APIs
> ** relative to popular R libraries
> h2. Documentation and example code
> * For new algorithms, create JIRAs for updating the user guide sections & 
> examples
> * Update Programming Guide
> * Update website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to