[ https://issues.apache.org/jira/browse/SPARK-36000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389243#comment-17389243 ]
Yikun Jiang edited comment on SPARK-36000 at 7/29/21, 4:02 AM: --------------------------------------------------------------- [~XinrongM] I did some investigation on this, I found the problem is triggered in python to Java unpickling, because Decimal('NaN') is not supported by net.razorvine.pickle, see more in https://issues.apache.org/jira/browse/SPARK-36337 was (Author: yikunkero): [~XinrongM] I did some investigation on this, I found the problem is triggered in python to Java unpickling, because Decimal('NaN') is not supported by net.razorvine. pickle In Python {code:java} >>> pickled = cloudpickle.dumps(decimal.Decimal('NaN')) b'\x80\x05\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x07decimal\x94\x8c\x07Decimal\x94\x93\x94\x8c\x03NaN\x94\x85\x94R\x94.' >>> pickle.loads(pickled) Decimal('NaN') {code} In Scala {code:java} scala> import net.razorvine.pickle.\{Pickler, Unpickler, PickleUtils} scala> val unpickle = new Unpickler scala> unpickle.loads(PickleUtils.str2bytes("\u0080\u0005\u0095!\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u008c\u0007decimal\u0094\u008c\u0007Decimal\u0094\u0093\u0094\u008c\u0003NaN\u0094\u0085\u0094R\u0094.")) net.razorvine.pickle.PickleException: problem construction object: java.lang.reflect.InvocationTargetException at net.razorvine.pickle.objects.AnyClassConstructor.construct(AnyClassConstructor.java:29) at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:773) at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:213) at net.razorvine.pickle.Unpickler.load(Unpickler.java:123) at net.razorvine.pickle.Unpickler.loads(Unpickler.java:136) ... 48 elided {code} I submit a PR in pickle upstream https://github.com/irmen/pickle/issues/7 . Looks like we can only contine this jira after this fix and bump pickle version to fixed version. > Support creation and operations of ps.Series/Index with Decimal('NaN') > ---------------------------------------------------------------------- > > Key: SPARK-36000 > URL: https://issues.apache.org/jira/browse/SPARK-36000 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.2.0 > Reporter: Xinrong Meng > Priority: Major > > The creation and operations of ps.Series/Index with Decimal('NaN') doesn't > work as expected. > That might be due to the underlying PySpark limit. > Please refer to sub-tasks for issues detected. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org