[jira] [Commented] (SPARK-25770) support SparkDataFrame pretty print
[ https://issues.apache.org/jira/browse/SPARK-25770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167324#comment-17167324 ] S Daniel Zafar commented on SPARK-25770: [~adrian555], what would your preferred print look like? > support SparkDataFrame pretty print > --- > > Key: SPARK-25770 > URL: https://issues.apache.org/jira/browse/SPARK-25770 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.3.0, 2.3.1, 2.3.2 >Reporter: Weiqiang Zhuang >Priority: Minor > > This is for continuous discussion with a requirement added in > [https://github.com/apache/spark/pull/22455#discussion_r223197863.] > > Summary: > SparkDataFrame is a S4 object, `show()` is the default method to display the > data frame to screen output. Currently the output is simply in string format > returned by `showString()` call which pre-formats the data frame and displays > as a table. This lacks the flexibility to re-format the output with a more > user friendly and pretty fashion, as has been seen in 1) S3 object's > `print()` method allows to specify arguments like `quote` etc to control the > output; 2) external tools such as `Jupyter` R notebook implement their own > customized way of display. > > This Jira aims to explore a feasible solution to improve the screen output > experience by both supporting a pretty print from with the SparkR package and > also offering a common hook for external tools to customize the display > function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30255) Support explain mode in SparkR df.explain
[ https://issues.apache.org/jira/browse/SPARK-30255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167316#comment-17167316 ] S Daniel Zafar commented on SPARK-30255: Hello- I would like to knock this one out as a first issue. Seems pretty straight forward. I'm planning to copy the PySpark API directly, such that both `extended` and `mode` are arguments, but if `extended` comes in as an object of class character then it treats it as mode. Does that sound like a good plan? I have written it up but need a little guidance on how folks typically build Spark for local testing. If it's okay for me to work on this please assign this task to me. > Support explain mode in SparkR df.explain > - > > Key: SPARK-30255 > URL: https://issues.apache.org/jira/browse/SPARK-30255 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Major > > This pr intends to support explain modes implemented in SPARK-30200(#26829) > for SparkR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30817) SparkR ML algorithms parity
[ https://issues.apache.org/jira/browse/SPARK-30817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166571#comment-17166571 ] S Daniel Zafar edited comment on SPARK-30817 at 7/28/20, 5:19 PM: -- I would like to work on this issue, is that all right [~hyukjin.kwon]? It would be my first. was (Author: dan_z): I would like to address this issue, is that all right [~hyukjin.kwon]? > SparkR ML algorithms parity > > > Key: SPARK-30817 > URL: https://issues.apache.org/jira/browse/SPARK-30817 > Project: Spark > Issue Type: Umbrella > Components: ML, SparkR >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > As of 3.0 the following algorithms are missing form SparkR > * {{LinearRegression}} > * {{FMRegressor}} (Added to ML in 3.0) > * {{FMClassifier}} (Added to ML in 3.0) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30817) SparkR ML algorithms parity
[ https://issues.apache.org/jira/browse/SPARK-30817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166571#comment-17166571 ] S Daniel Zafar commented on SPARK-30817: I would like to address this issue, is that all right [~hyukjin.kwon]? > SparkR ML algorithms parity > > > Key: SPARK-30817 > URL: https://issues.apache.org/jira/browse/SPARK-30817 > Project: Spark > Issue Type: Umbrella > Components: ML, SparkR >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > As of 3.0 the following algorithms are missing form SparkR > * {{LinearRegression}} > * {{FMRegressor}} (Added to ML in 3.0) > * {{FMClassifier}} (Added to ML in 3.0) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs
[ https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166569#comment-17166569 ] S Daniel Zafar commented on SPARK-12172: My opinion is that it makes sense to keep these methods, since they exist in PySpark. Removing basic things like `map` seems counterintuitive. The PR is closed, should we close this as well? > Consider removing SparkR internal RDD APIs > -- > > Key: SPARK-12172 > URL: https://issues.apache.org/jira/browse/SPARK-12172 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Felix Cheung >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20684) expose createOrReplaceGlobalTempView/createGlobalTempView and dropGlobalTempView in SparkR
[ https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166559#comment-17166559 ] S Daniel Zafar commented on SPARK-20684: The PR ([https://github.com/apache/spark/pull/17941]) was closed. I think we can close this. > expose createOrReplaceGlobalTempView/createGlobalTempView and > dropGlobalTempView in SparkR > -- > > Key: SPARK-20684 > URL: https://issues.apache.org/jira/browse/SPARK-20684 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hossein Falaki >Priority: Major > > This is a useful API that is not exposed in SparkR. It will help with moving > data between languages on a single single Spark application. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-31137) Opportunity to simplify execution plan when passing empty dataframes to subtract()
[ https://issues.apache.org/jira/browse/SPARK-31137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] S Daniel Zafar updated SPARK-31137: --- Comment: was deleted (was: Moving this to Databricks internal board.) > Opportunity to simplify execution plan when passing empty dataframes to > subtract() > -- > > Key: SPARK-31137 > URL: https://issues.apache.org/jira/browse/SPARK-31137 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.4.5 >Reporter: S Daniel Zafar >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Execution plans are similar when passing an empty versus non-empty DataFrame > to pyspark's subtract call. > {code:java} > df.subtract(regDf){code} > yields the same physical plan as: > {code:java} > df.subtract(emptyDf){code} > Since the operation (EXCEPT DISTINCT in Spark SQL) requires a sort on both > DataFrames, this can yield some significant performance speed-ups because if > the incoming DF is empty no processing should happen. > > Should be a quick fix for a seasoned commiter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31137) Opportunity to simplify execution plan when passing empty dataframes to subtract()
[ https://issues.apache.org/jira/browse/SPARK-31137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] S Daniel Zafar resolved SPARK-31137. Resolution: Won't Do moving to Databricks internal board. > Opportunity to simplify execution plan when passing empty dataframes to > subtract() > -- > > Key: SPARK-31137 > URL: https://issues.apache.org/jira/browse/SPARK-31137 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.4.5 >Reporter: S Daniel Zafar >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Execution plans are similar when passing an empty versus non-empty DataFrame > to pyspark's subtract call. > {code:java} > df.subtract(regDf){code} > yields the same physical plan as: > {code:java} > df.subtract(emptyDf){code} > Since the operation (EXCEPT DISTINCT in Spark SQL) requires a sort on both > DataFrames, this can yield some significant performance speed-ups because if > the incoming DF is empty no processing should happen. > > Should be a quick fix for a seasoned commiter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31137) Opportunity to simplify execution plan when passing empty dataframes to subtract()
[ https://issues.apache.org/jira/browse/SPARK-31137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17058872#comment-17058872 ] S Daniel Zafar commented on SPARK-31137: Moving this to Databricks internal board. > Opportunity to simplify execution plan when passing empty dataframes to > subtract() > -- > > Key: SPARK-31137 > URL: https://issues.apache.org/jira/browse/SPARK-31137 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.4.5 >Reporter: S Daniel Zafar >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > Execution plans are similar when passing an empty versus non-empty DataFrame > to pyspark's subtract call. > {code:java} > df.subtract(regDf){code} > yields the same physical plan as: > {code:java} > df.subtract(emptyDf){code} > Since the operation (EXCEPT DISTINCT in Spark SQL) requires a sort on both > DataFrames, this can yield some significant performance speed-ups because if > the incoming DF is empty no processing should happen. > > Should be a quick fix for a seasoned commiter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31137) Opportunity to simplify execution plan when passing empty dataframes to subtract()
S Daniel Zafar created SPARK-31137: -- Summary: Opportunity to simplify execution plan when passing empty dataframes to subtract() Key: SPARK-31137 URL: https://issues.apache.org/jira/browse/SPARK-31137 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 2.4.5 Reporter: S Daniel Zafar Execution plans are similar when passing an empty versus non-empty DataFrame to pyspark's subtract call. {code:java} df.subtract(regDf){code} yields the same physical plan as: {code:java} df.subtract(emptyDf){code} Since the operation (EXCEPT DISTINCT in Spark SQL) requires a sort on both DataFrames, this can yield some significant performance speed-ups because if the incoming DF is empty no processing should happen. Should be a quick fix for a seasoned commiter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org