[jira] [Commented] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528933#comment-17528933 ] Apache Spark commented on SPARK-38988: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36367 > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: Untitled.html, info.txt, warning printed.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528459#comment-17528459 ] Xinrong Meng commented on SPARK-38988: -- Thank you for raising that! I will try muting the warnings for now. > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: Untitled.html, info.txt, warning printed.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527432#comment-17527432 ] Bjørn Jørgensen commented on SPARK-38988: - I add a new fil "warning printed.txt" it show that it depends one the dataframe size. So if you have a dataframe Int64Index: 34 entries, 0 to 33 Data columns (total 37 columns): The warning won`t get printed. If the datafreme is Int64Index: 109 entries, 0 to 108 Data columns (total 112 columns): Then the warning is printed 13 times. > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: Untitled.html, info.txt, warning printed.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38988) Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get printed many times.
[ https://issues.apache.org/jira/browse/SPARK-38988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527263#comment-17527263 ] Hyukjin Kwon commented on SPARK-38988: -- cc [~XinrongM]and [~itholic] FYI > Pandas API - "PerformanceWarning: DataFrame is highly fragmented." get > printed many times. > --- > > Key: SPARK-38988 > URL: https://issues.apache.org/jira/browse/SPARK-38988 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.3.0, 3.4.0 >Reporter: Bjørn Jørgensen >Priority: Major > Attachments: Untitled.html, info.txt > > > I add a file and a notebook with the info msg I get when I run df.info() > Spark master build from 13.04.22. > df.shape > (763300, 224) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org