Hi,

I've hit a wall with trying to just implement a couple of Scala methods
of in a Python version of our project.

My Python function looks like this:

def Write_Graphml(data, graphml_path, sc):
    return
sc.getOrCreate()._jvm.io.archivesunleashed.app.WriteGraphML(data,
graphml_path).apply


Where data is a DataFrame that has been collected; data.collect().

On the Scala side is it basically:

object WriteGraphML {
  apply(data: Array[Row], graphmlPath: String): Boolean = {
    ...
    massages an Array[Row] into GraphML
    ...
    True
}

When I try to use it in PySpark, I end up getting this error message:

Py4JError: An error occurred while calling
None.io.archivesunleashed.app.WriteGraphML. Trace:
py4j.Py4JException: Constructor
io.archivesunleashed.app.WriteGraphML([class java.util.ArrayList, class
java.lang.String]) does not exist
        at
py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)
        at
py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)
        at py4j.Gateway.invoke(Gateway.java:237)
        at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)



Based on my research, I'm fairly certain it is because of how Py4J is
passing off the Python List (data) to the JVM, and then passing it to
Scala. It's ending up as an ArrayList instead of an Array[Row].

Do I need to tweak data before it is passed to Write_Graphml? Or am I
doing something else wrong here.

...and not 100% sure if this is a user or dev list question. Let me know
if I should move this over to user.

Thanks in advance for any help!

cheers!

-nruest

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to