[ https://issues.apache.org/jira/browse/BEAM-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897560#comment-16897560 ]
Kyle Weaver commented on BEAM-7864: ----------------------------------- To do the repartition in Spark, we need to encode the keys and values separately. We can do that using the component coders of the KvCoder. However, if we loosen the type to Coder<KV<K, V>> to accommodate LengthPrefixCoder, we can't extract separate coders for the keys and values AFAICT. > portable spark Reshuffle coder cast exception > --------------------------------------------- > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark > Reporter: Kyle Weaver > Assignee: Kyle Weaver > Priority: Major > Labels: portability-spark > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.14#76016)