[ 
https://issues.apache.org/jira/browse/SPARK-36000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389243#comment-17389243
 ] 

Yikun Jiang edited comment on SPARK-36000 at 7/29/21, 4:02 AM:
---------------------------------------------------------------

[~XinrongM] I did some investigation on this, I found the problem is triggered 
in python to Java unpickling, because Decimal('NaN') is not supported by 
net.razorvine.pickle, see more in 
https://issues.apache.org/jira/browse/SPARK-36337

 


was (Author: yikunkero):
[~XinrongM] I did some investigation on this, I found the problem is triggered 
in python to Java unpickling, because Decimal('NaN') is not supported by 
net.razorvine. pickle

In Python

{code:java}
>>> pickled = cloudpickle.dumps(decimal.Decimal('NaN'))
b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.'
>>> pickle.loads(pickled)
Decimal('NaN')
{code}

In Scala

{code:java}
scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils}
scala> val unpickle = new Unpickler
scala> 
unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094."))
net.razorvine.pickle.PickleException: problem construction object: 
java.lang.reflect.InvocationTargetException
 at 
net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29)
 at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773)
 at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213)
 at net.razorvine.pickle.Unpickler.load(Unpickler.java:123)
 at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136)
 ... 48 elided
{code}

I submit a PR in pickle upstream https://github.com/irmen/pickle/issues/7 . 
Looks like we can only contine this jira after this fix and bump pickle version 
to fixed version.

> Support creation and operations of ps.Series/Index with Decimal('NaN')
> ----------------------------------------------------------------------
>
>                 Key: SPARK-36000
>                 URL: https://issues.apache.org/jira/browse/SPARK-36000
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 3.2.0
>            Reporter: Xinrong Meng
>            Priority: Major
>
> The creation and operations of ps.Series/Index with Decimal('NaN') doesn't 
> work as expected.
> That might be due to the underlying PySpark limit.
> Please refer to sub-tasks for issues detected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to