Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-21 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2014269507 @cloud-fan The initial version has been submitted here: https://github.com/apache/spark/pull/45657 I am still improving it, and when it is ready, I will ask for your help in

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-21 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2014129875 > If no objections, I'll revert this tomorrow. We can have a new PR to add pretty string for array/struct/map in `to_csv`. I will submit a new PR to `add pretty string for

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-21 Thread via GitHub
cloud-fan commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2012496955 If no objections, I'll revert this tomorrow. We can have a new PR to add pretty string for array/struct/map in `to_csv`. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
cloud-fan commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2011239475 It's ok to not support round-trip. array/struct/map is not CSV standard either. I think the principle is to avoid breaking changes if possible. Since `to_csv` already supports

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
LuciferYang commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2011183252 I don't oppose this resolution, but if `to_csv` can handle complex data structures, what would our subsequent plans be? Would `from_csv` also need to support complex data structures

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2011171313 Use `ToPrettyString.eval` to generate pretty strings for these types in `to_csv`, I have also tried it in this PR. Below:

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
cloud-fan commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2011158182 I'd prefer to fix the inconsistency. Can we use `ToPrettyString.eval` to generate pretty strings for these types in `to_csv`? -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2010991612 - Why is the result displayed through `to_csv` inconsistency in Scala and Python for this case? Because this case is on the `python side`, it ultimately uses `GenericArrayData`,

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
MaxGekk commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2009888375 > You never know how people will use Spark and I'm not sure why it's necessary to ban this use case. I believe it is not too bad to ban such output in any case (this is even

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
cloud-fan commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2009679810 You never know how people will use Spark and I'm not sure why it's necessary to ban this use case. Having inconsistency between scala and python is bad. Can we fix this problem by

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2009368113 > So we make a breaking change simply because we don't like the object address in the string? can we fix it? These types cannot be read back through 'from_csv' after generating

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
cloud-fan commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2009269901 So we make a breaking change simply because we don't like the object address in the string? can we fix it? -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2009371343 > `to_json` should be fine because it has nested type representation Yeah, there is no standard for nested types in `csv` format, but the `json` format has a standard. --

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
HyukjinKwon commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2008929754 `to_json` should be fine because it has nested type representation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
MaxGekk commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2008719720 @panbingkun I forgot to ask you. Does `to_json` have the same issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-20 Thread via GitHub
MaxGekk commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-2008710532 > generating non-standard but pretty strings @cloud-fan It prints "address" of objects, not values. Do you have an example which you worry about? -- This is an automated message

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-19 Thread via GitHub
cloud-fan commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-200816 why do we remove a working feature? what's wrong with `to_csv` generating non-standard but pretty strings for these values? -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-13 Thread via GitHub
MaxGekk closed pull request #44665: [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data URL: https://github.com/apache/spark/pull/44665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-13 Thread via GitHub
MaxGekk commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-1994249547 +1, LGTM. Merging to master. Thank you, @panbingkun and @HyukjinKwon @LuciferYang @srowen for review. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
panbingkun commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1520791983 ## python/pyspark/sql/functions/builtin.py: ## @@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Col |

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
MaxGekk commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1520694548 ## python/pyspark/sql/functions/builtin.py: ## @@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Col |

Re: [PR] [SPARK-46654][SQL][Python] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
panbingkun commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1519273344 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala: ## @@ -260,16 +263,33 @@ case class StructsToCsv( child =

Re: [PR] [SPARK-46654][SQL][Python] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
panbingkun commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1519273344 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala: ## @@ -260,16 +263,33 @@ case class StructsToCsv( child =

Re: [PR] [SPARK-46654][SQL][Python] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
panbingkun commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1519260055 ## sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala: ## @@ -294,10 +294,19 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession

Re: [PR] [SPARK-46654][SQL][Python] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
panbingkun commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1519252452 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala: ## @@ -260,16 +263,36 @@ case class StructsToCsv( child =

Re: [PR] [SPARK-46654][SQL][Python] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
panbingkun commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1519249360 ## python/pyspark/sql/functions/builtin.py: ## @@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Col |