[jira] [Commented] (SPARK-37174) WARN WindowExec: No Partition Defined is being printed 4 times.

Hyukjin Kwon (Jira) Tue, 02 Nov 2021 00:57:23 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-37174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437184#comment-17437184
 ]


Hyukjin Kwon commented on SPARK-37174:
--------------------------------------

This is related to default index, see also 
https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type.
Spark 3.3 targets to remove such warnings.

> WARN WindowExec: No Partition Defined is being printed 4 times. 
> ----------------------------------------------------------------
>
>                 Key: SPARK-37174
>                 URL: https://issues.apache.org/jira/browse/SPARK-37174
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.3.0
>            Reporter: Bjørn Jørgensen
>            Priority: Major
>
> Hi I use this code  
> {code:java}
> f01 = spark.read.json("/home/test_files/falk/flatted110721/F01.json/*.json")
> pf01 = f01.to_pandas_on_spark()
> pf01 = pf01.rename(columns=lambda x: re.sub(':P$', '', x))
> pf01["OBJECT_CONTRACT:DATE_PUBLICATION_NOTICE"] = 
> ps.to_datetime(pf01["OBJECT_CONTRACT:DATE_PUBLICATION_NOTICE"])
> pf01.info(){code}
>  
>  sometimes it prints 
>   
> {code:java}
>  21/10/31 20:38:04 WARN WindowExec: No Partition Defined for Window 
> operation! Moving all data to a single partition, this can cause serious 
> performance degradation.
>  21/10/31 20:38:04 WARN package: Truncated the string representation of a 
> plan since it was too large. This behavior can be adjusted by setting 
> 'spark.sql.debug.maxToStringFields'.
>  21/10/31 20:38:08 WARN WindowExec: No Partition Defined for Window 
> operation! Moving all data to a single partition, this can cause serious 
> performance degradation.
>  /opt/spark/python/pyspark/sql/pandas/conversion.py:214: PerformanceWarning: 
> DataFrame is highly fragmented.  This is usually the result of calling 
> `frame.insert` many times, which has poor performance.  Consider joining all 
> columns at once using pd.concat(axis=1) instead.  To get a de-fragmented 
> frame, use `newframe = frame.copy()`
>    df[column_name] = series
>  /opt/spark/python/pyspark/pandas/utils.py:967: UserWarning: `to_pandas` 
> loads all data into the driver's memory. It should only be used if the 
> resulting pandas Series is expected to be small.
>    warnings.warn(message, UserWarning)
>  21/10/31 20:38:16 WARN WindowExec: No Partition Defined for Window 
> operation! Moving all data to a single partition, this can cause serious 
> performance degradation.
>  21/10/31 20:38:18 WARN WindowExec: No Partition Defined for Window 
> operation! Moving all data to a single partition, this can cause serious 
> performance degradation.{code}
>  
>  and some other times it "just" prints 
>   
> {code:java}
>  21/10/31 21:24:13 WARN WindowExec: No Partition Defined for Window 
> operation! Moving all data to a single partition, this can cause serious 
> performance degradation.
>  21/10/31 21:24:16 WARN WindowExec: No Partition Defined for Window 
> operation! Moving all data to a single partition, this can cause serious 
> performance degradation.
>  21/10/31 21:24:22 WARN WindowExec: No Partition Defined for Window 
> operation! Moving all data to a single partition, this can cause serious 
> performance degradation.
>  21/10/31 21:24:24 WARN WindowExec: No Partition Defined for Window 
> operation! Moving all data to a single partition, this can cause serious 
> performance degradation.{code}
> Why does it print df[column_name] = series ?
>   
>  can we remove /opt/spark/python/pyspark/pandas/utils.py:967: ?
>  and warnings.warn(message, UserWarning) ?
>  and 3 of WARN WindowExec: No Partition Defined for Window operation! Moving 
> all data to a single partition, this can cause serious performance 
> degradation.?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37174) WARN WindowExec: No Partition Defined is being printed 4 times.

Reply via email to