panbingkun commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2014269507
@cloud-fan
The initial version has been submitted here:
https://github.com/apache/spark/pull/45657
I am still improving it, and when it is ready, I will ask for your help in
panbingkun commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2014129875
> If no objections, I'll revert this tomorrow. We can have a new PR to add
pretty string for array/struct/map in `to_csv`.
I will submit a new PR to `add pretty string for
cloud-fan commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2012496955
If no objections, I'll revert this tomorrow. We can have a new PR to add
pretty string for array/struct/map in `to_csv`.
--
This is an automated message from the Apache Git Service.
cloud-fan commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2011239475
It's ok to not support round-trip. array/struct/map is not CSV standard
either. I think the principle is to avoid breaking changes if possible. Since
`to_csv` already supports
LuciferYang commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2011183252
I don't oppose this resolution, but if `to_csv` can handle complex data
structures, what would our subsequent plans be? Would `from_csv` also need to
support complex data structures
panbingkun commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2011171313
Use `ToPrettyString.eval` to generate pretty strings for these types in
`to_csv`, I have also tried it in this PR. Below:
cloud-fan commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2011158182
I'd prefer to fix the inconsistency. Can we use `ToPrettyString.eval` to
generate pretty strings for these types in `to_csv`?
--
This is an automated message from the Apache Git
panbingkun commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2010991612
- Why is the result displayed through `to_csv` inconsistency in Scala and
Python for this case?
Because this case is on the `python side`, it ultimately uses
`GenericArrayData`,
MaxGekk commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2009888375
> You never know how people will use Spark and I'm not sure why it's
necessary to ban this use case.
I believe it is not too bad to ban such output in any case (this is even
cloud-fan commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2009679810
You never know how people will use Spark and I'm not sure why it's necessary
to ban this use case. Having inconsistency between scala and python is bad. Can
we fix this problem by
panbingkun commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2009368113
> So we make a breaking change simply because we don't like the object
address in the string? can we fix it?
These types cannot be read back through 'from_csv' after generating
cloud-fan commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2009269901
So we make a breaking change simply because we don't like the object address
in the string? can we fix it?
--
This is an automated message from the Apache Git Service.
To respond to
panbingkun commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2009371343
> `to_json` should be fine because it has nested type representation
Yeah, there is no standard for nested types in `csv` format, but the `json`
format has a standard.
--
HyukjinKwon commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2008929754
`to_json` should be fine because it has nested type representation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
MaxGekk commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2008719720
@panbingkun I forgot to ask you. Does `to_json` have the same issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
MaxGekk commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-2008710532
> generating non-standard but pretty strings
@cloud-fan It prints "address" of objects, not values. Do you have an
example which you worry about?
--
This is an automated message
cloud-fan commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-200816
why do we remove a working feature? what's wrong with `to_csv` generating
non-standard but pretty strings for these values?
--
This is an automated message from the Apache Git
MaxGekk closed pull request #44665: [SPARK-46654][SQL][PYTHON] Make `to_csv`
explicitly indicate that it does not support some types of data
URL: https://github.com/apache/spark/pull/44665
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
MaxGekk commented on PR #44665:
URL: https://github.com/apache/spark/pull/44665#issuecomment-1994249547
+1, LGTM. Merging to master.
Thank you, @panbingkun and @HyukjinKwon @LuciferYang @srowen for review.
--
This is an automated message from the Apache Git Service.
To respond to the
panbingkun commented on code in PR #44665:
URL: https://github.com/apache/spark/pull/44665#discussion_r1520791983
##
python/pyspark/sql/functions/builtin.py:
##
@@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options:
Optional[Dict[str, str]] = None) -> Col
|
MaxGekk commented on code in PR #44665:
URL: https://github.com/apache/spark/pull/44665#discussion_r1520694548
##
python/pyspark/sql/functions/builtin.py:
##
@@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options:
Optional[Dict[str, str]] = None) -> Col
|
panbingkun commented on code in PR #44665:
URL: https://github.com/apache/spark/pull/44665#discussion_r1519273344
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala:
##
@@ -260,16 +263,33 @@ case class StructsToCsv(
child =
panbingkun commented on code in PR #44665:
URL: https://github.com/apache/spark/pull/44665#discussion_r1519273344
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala:
##
@@ -260,16 +263,33 @@ case class StructsToCsv(
child =
panbingkun commented on code in PR #44665:
URL: https://github.com/apache/spark/pull/44665#discussion_r1519260055
##
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala:
##
@@ -294,10 +294,19 @@ class CsvFunctionsSuite extends QueryTest with
SharedSparkSession
panbingkun commented on code in PR #44665:
URL: https://github.com/apache/spark/pull/44665#discussion_r1519252452
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala:
##
@@ -260,16 +263,36 @@ case class StructsToCsv(
child =
panbingkun commented on code in PR #44665:
URL: https://github.com/apache/spark/pull/44665#discussion_r1519249360
##
python/pyspark/sql/functions/builtin.py:
##
@@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options:
Optional[Dict[str, str]] = None) -> Col
|
26 matches
Mail list logo