[ https://issues.apache.org/jira/browse/SPARK-23018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319417#comment-16319417 ]
Bryan Cutler commented on SPARK-23018: -------------------------------------- I can submit a PR > PySpark creatDataFrame causes Pandas warning of assignment to a copy of a > reference > ----------------------------------------------------------------------------------- > > Key: SPARK-23018 > URL: https://issues.apache.org/jira/browse/SPARK-23018 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0 > Reporter: Bryan Cutler > > When calling {{SparkSession.createDataFrame}} with a Pandas DataFrame as > input (with Arrow disabled) a Pandas warning is raised when the DataFrame is > a slice: > {noformat} > In [1]: import numpy as np > ...: import pandas as pd > ...: pdf = pd.DataFrame(np.random.rand(100, 2)) > ...: > In [2]: df = spark.createDataFrame(pdf[:10]) > /home/bryan/git/spark/python/pyspark/sql/session.py:476: > SettingWithCopyWarning: > A value is trying to be set on a copy of a slice from a DataFrame. > Try using .loc[row_indexer,col_indexer] = value instead > See the caveats in the documentation: > http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy > pdf[column] = s > {noformat} > This doesn't seem to cause a bug in this case, but might for others. It > could be avoided by only assigning the series if it was a modified timestamp > field. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org