[jira] [Commented] (SPARK-37181) pyspark.pandas.read_csv() should support latin-1 encoding
[ https://issues.apache.org/jira/browse/SPARK-37181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444117#comment-17444117 ] Chuck Connell commented on SPARK-37181: --- That would be a good solution, just convert latin-1 silently to ISO-8859-1. > pyspark.pandas.read_csv() should support latin-1 encoding > - > > Key: SPARK-37181 > URL: https://issues.apache.org/jira/browse/SPARK-37181 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > {{In regular pandas, you can say read_csv(encoding='latin-1'). This encoding > is not recognized in pyspark.pandas. You have to use Windows-1252 instead, > which is almost the same but not identical. }} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37198) pyspark.pandas read_csv() and to_csv() should handle local files
[ https://issues.apache.org/jira/browse/SPARK-37198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438789#comment-17438789 ] Chuck Connell edited comment on SPARK-37198 at 11/4/21, 3:42 PM: - There are many hints/techtips on the Internet which say that {{[file://local_path|file://local_path/] }} already works to read and write local files from a Spark cluster. But in my testing (from Databricks) this is not true. I have never gotten it to work. If there is already a way to read/write local files, please say the exact, tested method to do so. was (Author: chconnell): There are many hints/techtips on the Internet which say that {{file://local_path }}already works to read and write local files from a Spark cluster. But in my testing (from Databricks) this is not true. I have never gotten it to work. If there is already a way to read/write local files, please say the exact, tested method to do so. > pyspark.pandas read_csv() and to_csv() should handle local files > - > > Key: SPARK-37198 > URL: https://issues.apache.org/jira/browse/SPARK-37198 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > Pandas programmers who move their code to Spark would like to import and > export text files to and from their local disk. I know there are technical > hurdles to this (since Spark is usually in a cluster that does not know where > your local computer is) but it would really help code migration. > For read_csv() and to_csv(), the syntax {{*file://c:/Temp/my_file.csv* }}(or > something like this) should import and export to the local disk on Windows. > Similarly for Mac and Linux. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37198) pyspark.pandas read_csv() and to_csv() should handle local files
[ https://issues.apache.org/jira/browse/SPARK-37198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438789#comment-17438789 ] Chuck Connell commented on SPARK-37198: --- There are many hints/techtips on the Internet which say that {{file://local_path }}already works to read and write local files from a Spark cluster. But in my testing (from Databricks) this is not true. I have never gotten it to work. If there is already a way to read/write local files, please say the exact, tested method to do so. > pyspark.pandas read_csv() and to_csv() should handle local files > - > > Key: SPARK-37198 > URL: https://issues.apache.org/jira/browse/SPARK-37198 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > Pandas programmers who move their code to Spark would like to import and > export text files to and from their local disk. I know there are technical > hurdles to this (since Spark is usually in a cluster that does not know where > your local computer is) but it would really help code migration. > For read_csv() and to_csv(), the syntax {{*file://c:/Temp/my_file.csv* }}(or > something like this) should import and export to the local disk on Windows. > Similarly for Mac and Linux. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37197) PySpark pandas recent issues from chconnell
[ https://issues.apache.org/jira/browse/SPARK-37197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Connell updated SPARK-37197: -- Description: SPARK-37180 PySpark.pandas should support __version__ SPARK-37181 pyspark.pandas.read_csv() should support latin-1 encoding SPARK-37183 pyspark.pandas.DataFrame.map() should support .fillna() SPARK-37184 pyspark.pandas should support DF["column"].str.split("some_suffix").str[0] SPARK-37186 pyspark.pandas should support tseries.offsets SPARK-37187 pyspark.pandas fails to create a histogram of one column from a large DataFrame SPARK-37188 pyspark.pandas histogram accepts the title option but does not add a title to the plot SPARK-37189 pyspark.pandas histogram accepts the range option but does not use it SPARK-37198 pyspark.pandas read_csv() and to_csv() should handle local files was: SPARK-37180 PySpark.pandas should support __version__ SPARK-37181 pyspark.pandas.read_csv() should support latin-1 encoding SPARK-37183 pyspark.pandas.DataFrame.map() should support .fillna() SPARK-37184 pyspark.pandas should support DF["column"].str.split("some_suffix").str[0] SPARK-37186 pyspark.pandas should support tseries.offsets SPARK-37187 pyspark.pandas fails to create a histogram of one column from a large DataFrame SPARK-37188 pyspark.pandas histogram accepts the title option but does not add a title to the plot SPARK-37189 pyspark.pandas histogram accepts the range option but does not use it > PySpark pandas recent issues from chconnell > --- > > Key: SPARK-37197 > URL: https://issues.apache.org/jira/browse/SPARK-37197 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > SPARK-37180 PySpark.pandas should support __version__ > SPARK-37181 pyspark.pandas.read_csv() should support latin-1 encoding > > SPARK-37183 pyspark.pandas.DataFrame.map() should support .fillna() > > SPARK-37184 pyspark.pandas should support > DF["column"].str.split("some_suffix").str[0] > SPARK-37186 pyspark.pandas should support tseries.offsets > SPARK-37187 pyspark.pandas fails to create a histogram of one column from a > large DataFrame > SPARK-37188 pyspark.pandas histogram accepts the title option but does not > add a title to the plot > SPARK-37189 pyspark.pandas histogram accepts the range option but does not > use it > SPARK-37198 pyspark.pandas read_csv() and to_csv() should handle local files > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37198) pyspark.pandas read_csv() and to_csv() should handle local files
Chuck Connell created SPARK-37198: - Summary: pyspark.pandas read_csv() and to_csv() should handle local files Key: SPARK-37198 URL: https://issues.apache.org/jira/browse/SPARK-37198 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell Pandas programmers who move their code to Spark would like to import and export text files to and from their local disk. I know there are technical hurdles to this (since Spark is usually in a cluster that does not know where your local computer is) but it would really help code migration. For read_csv() and to_csv(), the syntax {{*file://c:/Temp/my_file.csv* }}(or something like this) should import and export to the local disk on Windows. Similarly for Mac and Linux. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37189) pyspark.pandas histogram accepts the range option but does not use it
[ https://issues.apache.org/jira/browse/SPARK-37189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437329#comment-17437329 ] Chuck Connell edited comment on SPARK-37189 at 11/2/21, 1:18 PM: - Ok, will do. was (Author: chconnell): Ok, will do. FYI, getting covid shot today, so I may be tired for a few days. > pyspark.pandas histogram accepts the range option but does not use it > - > > Key: SPARK-37189 > URL: https://issues.apache.org/jira/browse/SPARK-37189 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In pyspark.pandas if you write a line like this > {quote}DF.plot.hist(bins=30, range=[0, 20], title="US Counties -- > DeathsPer100k (<20)") > {quote} > it compiles and runs, but the plot does not respect the range. All the values > are shown. > The workaround is to create a new DataFrame that pre-selects just the rows > you want, but line above should work also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37197) PySpark pandas recent issues from chconnell
Chuck Connell created SPARK-37197: - Summary: PySpark pandas recent issues from chconnell Key: SPARK-37197 URL: https://issues.apache.org/jira/browse/SPARK-37197 Project: Spark Issue Type: Umbrella Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell SPARK-37180 PySpark.pandas should support __version__ SPARK-37181 pyspark.pandas.read_csv() should support latin-1 encoding SPARK-37183 pyspark.pandas.DataFrame.map() should support .fillna() SPARK-37184 pyspark.pandas should support DF["column"].str.split("some_suffix").str[0] SPARK-37186 pyspark.pandas should support tseries.offsets SPARK-37187 pyspark.pandas fails to create a histogram of one column from a large DataFrame SPARK-37188 pyspark.pandas histogram accepts the title option but does not add a title to the plot SPARK-37189 pyspark.pandas histogram accepts the range option but does not use it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37189) pyspark.pandas histogram accepts the range option but does not use it
[ https://issues.apache.org/jira/browse/SPARK-37189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437329#comment-17437329 ] Chuck Connell commented on SPARK-37189: --- Ok, will do. FYI, getting covid shot today, so I may be tired for a few days. > pyspark.pandas histogram accepts the range option but does not use it > - > > Key: SPARK-37189 > URL: https://issues.apache.org/jira/browse/SPARK-37189 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In pyspark.pandas if you write a line like this > {quote}DF.plot.hist(bins=30, range=[0, 20], title="US Counties -- > DeathsPer100k (<20)") > {quote} > it compiles and runs, but the plot does not respect the range. All the values > are shown. > The workaround is to create a new DataFrame that pre-selects just the rows > you want, but line above should work also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37189) pyspark.pandas histogram accepts the range option but does not use it
[ https://issues.apache.org/jira/browse/SPARK-37189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Connell updated SPARK-37189: -- Description: In pyspark.pandas if you write a line like this {quote}DF.plot.hist(bins=30, range=[0, 20], title="US Counties -- DeathsPer100k (<20)") {quote} it compiles and runs, but the plot does not respect the range. All the values are shown. The workaround is to create a new DataFrame that pre-selects just the rows you want, but line above should work also. was: In pyspark.pandas if you write a line like this {quote}DF.plot.hist(bins=20, title="US Counties -- FullVaxPer100") {quote} it compiles and runs, but the plot has no title. > pyspark.pandas histogram accepts the range option but does not use it > - > > Key: SPARK-37189 > URL: https://issues.apache.org/jira/browse/SPARK-37189 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In pyspark.pandas if you write a line like this > {quote}DF.plot.hist(bins=30, range=[0, 20], title="US Counties -- > DeathsPer100k (<20)") > {quote} > it compiles and runs, but the plot does not respect the range. All the values > are shown. > The workaround is to create a new DataFrame that pre-selects just the rows > you want, but line above should work also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37189) CLONE - pyspark.pandas histogram accepts the title option but does not add a title to the plot
Chuck Connell created SPARK-37189: - Summary: CLONE - pyspark.pandas histogram accepts the title option but does not add a title to the plot Key: SPARK-37189 URL: https://issues.apache.org/jira/browse/SPARK-37189 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell In pyspark.pandas if you write a line like this {quote}DF.plot.hist(bins=20, title="US Counties -- FullVaxPer100") {quote} it compiles and runs, but the plot has no title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37189) pyspark.pandas histogram accepts the range option but does not use it
[ https://issues.apache.org/jira/browse/SPARK-37189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Connell updated SPARK-37189: -- Summary: pyspark.pandas histogram accepts the range option but does not use it (was: CLONE - pyspark.pandas histogram accepts the title option but does not add a title to the plot) > pyspark.pandas histogram accepts the range option but does not use it > - > > Key: SPARK-37189 > URL: https://issues.apache.org/jira/browse/SPARK-37189 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In pyspark.pandas if you write a line like this > {quote}DF.plot.hist(bins=20, title="US Counties -- FullVaxPer100") > {quote} > it compiles and runs, but the plot has no title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37188) pyspark.pandas histogram accepts the title option but does not add a title to the plot
Chuck Connell created SPARK-37188: - Summary: pyspark.pandas histogram accepts the title option but does not add a title to the plot Key: SPARK-37188 URL: https://issues.apache.org/jira/browse/SPARK-37188 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell In pyspark.pandas if you write a line like this {quote}DF.plot.hist(bins=20, title="US Counties -- FullVaxPer100") {quote} it compiles and runs, but the plot has no title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37187) pyspark.pandas fails to create a histogram of one column from a large DataFrame
Chuck Connell created SPARK-37187: - Summary: pyspark.pandas fails to create a histogram of one column from a large DataFrame Key: SPARK-37187 URL: https://issues.apache.org/jira/browse/SPARK-37187 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell When trying to create a histogram from one column of a large DataFrame, pyspark.pandas fails. So this line {quote}DF.plot.hist(column="FullVaxPer100", bins=20) # there are many other columns {quote} yields this error {quote}cannot resolve 'least(min(EndDate), min(EndDeaths), min(`STATE-COUNTY`), min(StartDate), min(StartDeaths), min(POPESTIMATE2020), min(ST_ABBR), min(VaxStartDate), min(Series_Complete_Yes_Start), min(Administered_Dose1_Recip_Start), min(VaxEndDate), min(Series_Complete_Yes_End), min(Administered_Dose1_Recip_End), min(Deaths), min(Series_Complete_Yes_Mid), min(Administered_Dose1_Recip_Mid), min(FullVaxPer100), min(OnePlusVaxPer100), min(DeathsPer100k))' due to data type mismatch: The expressions should all have the same type, got LEAST(timestamp, bigint, string, timestamp, bigint, bigint, string, timestamp, bigint, bigint, timestamp, bigint, bigint, bigint, double, double, double, double, double).; {quote} The odd thing is that pyspark.pandas seems to be operating on all the columns when only one is needed. As a workaround, you can first create a one-column DataFrame that selects just the field you want, then make a histogram of that. But the command above should work also. I can supply the complete program and datasets that demonstrate the error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37186) pyspark.pandas should support tseries.offsets
Chuck Connell created SPARK-37186: - Summary: pyspark.pandas should support tseries.offsets Key: SPARK-37186 URL: https://issues.apache.org/jira/browse/SPARK-37186 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell In regular pandas you can use pandas.offsets to create a time delta. This allows a line like {quote}this_period_start = OVERALL_START_DATE + pd.offsets.Day(NN) {quote} But this does not work in pyspark.pandas. There are good workarounds, such as datetime.timedelta(days=NN), but pandas programmers would like to move code to pyspark without changing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-37182) pyspark.pandas.to_numeric() should support the errors option
[ https://issues.apache.org/jira/browse/SPARK-37182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Connell updated SPARK-37182: -- Comment: was deleted (was: https://issues.apache.org/jira/browse/SPARK-36609) > pyspark.pandas.to_numeric() should support the errors option > > > Key: SPARK-37182 > URL: https://issues.apache.org/jira/browse/SPARK-37182 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say to_numeric(errors='coerce'). But the errors > option is not recognized by pyspark.pandas. > FYI, the errors option is recognized by pyspark.pandas.to_datetime() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37182) pyspark.pandas.to_numeric() should support the errors option
[ https://issues.apache.org/jira/browse/SPARK-37182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Connell resolved SPARK-37182. --- Resolution: Duplicate https://issues.apache.org/jira/browse/SPARK-36609 > pyspark.pandas.to_numeric() should support the errors option > > > Key: SPARK-37182 > URL: https://issues.apache.org/jira/browse/SPARK-37182 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say to_numeric(errors='coerce'). But the errors > option is not recognized by pyspark.pandas. > FYI, the errors option is recognized by pyspark.pandas.to_datetime() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37182) pyspark.pandas.to_numeric() should support the errors option
[ https://issues.apache.org/jira/browse/SPARK-37182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436992#comment-17436992 ] Chuck Connell commented on SPARK-37182: --- Duplicate of https://issues.apache.org/jira/browse/SPARK-36609 > pyspark.pandas.to_numeric() should support the errors option > > > Key: SPARK-37182 > URL: https://issues.apache.org/jira/browse/SPARK-37182 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say to_numeric(errors='coerce'). But the errors > option is not recognized by pyspark.pandas. > FYI, the errors option is recognized by pyspark.pandas.to_datetime() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37184) pyspark.pandas should support DF["column"].str.split("some_suffix").str[0]
Chuck Connell created SPARK-37184: - Summary: pyspark.pandas should support DF["column"].str.split("some_suffix").str[0] Key: SPARK-37184 URL: https://issues.apache.org/jira/browse/SPARK-37184 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell In regular pandas you can say {quote}DF["column"] = DF["column"].str.split("suffix").str[0] {quote} In order to strip off a suffix. With pyspark.pandas, this syntax does not work. You have to say something like {quote}DF["column"] = DF["column"].str.replace("suffix", '', 1) {quote} which works fine if the suffix only appears once at the end, but is not really the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37183) pyspark.pandas.DataFrame.map() should support .fillna()
Chuck Connell created SPARK-37183: - Summary: pyspark.pandas.DataFrame.map() should support .fillna() Key: SPARK-37183 URL: https://issues.apache.org/jira/browse/SPARK-37183 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell In regular pandas you can say {quote}DF["new_column"] = DF["column"].map(some_map).fillna(DF["column"]) {quote} In order to use the existing value if the mapping key is not found. But this does not work in pyspark.pandas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37182) pyspark.pandas.to_numeric() should support the errors option
Chuck Connell created SPARK-37182: - Summary: pyspark.pandas.to_numeric() should support the errors option Key: SPARK-37182 URL: https://issues.apache.org/jira/browse/SPARK-37182 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell In regular pandas you can say to_numeric(errors='coerce'). But the errors option is not recognized by pyspark.pandas. FYI, the errors option is recognized by pyspark.pandas.to_datetime() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37181) pyspark.pandas.read_csv() should support latin-1 encoding
Chuck Connell created SPARK-37181: - Summary: pyspark.pandas.read_csv() should support latin-1 encoding Key: SPARK-37181 URL: https://issues.apache.org/jira/browse/SPARK-37181 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell {{In regular pandas, you can say read_csv(encoding='latin-1'). This encoding is not recognized in pyspark.pandas. You have to use Windows-1252 instead, which is almost the same but not identical. }} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37180) PySpark.pandas should support __version__
[ https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Connell updated SPARK-37180: -- Description: In regular pandas you can say {quote}pd.___version___ {quote} to get the pandas version number. PySpark pandas should support the same. was: In regular pandas you can say {quote}{{pd.__version__ }}{quote} to get the pandas version number. PySpark pandas should support the same. > PySpark.pandas should support __version__ > - > > Key: SPARK-37180 > URL: https://issues.apache.org/jira/browse/SPARK-37180 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say > {quote}pd.___version___ > {quote} > to get the pandas version number. PySpark pandas should support the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37180) PySpark.pandas should support __version__
[ https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Connell updated SPARK-37180: -- Description: In regular pandas you can say {quote}{{pd.__version__ }}{quote} to get the pandas version number. PySpark pandas should support the same. was:In regular pandas you can say pd.__version__ to get the pandas version number. PySpark pandas should support the same. > PySpark.pandas should support __version__ > - > > Key: SPARK-37180 > URL: https://issues.apache.org/jira/browse/SPARK-37180 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say > {quote}{{pd.__version__ }}{quote} > to get the pandas version number. PySpark pandas should support the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37180) PySpark.pandas should support __version__
Chuck Connell created SPARK-37180: - Summary: PySpark.pandas should support __version__ Key: SPARK-37180 URL: https://issues.apache.org/jira/browse/SPARK-37180 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Chuck Connell In regular pandas you can say pd.__version__ to get the pandas version number. PySpark pandas should support the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org