[jira] [Commented] (SPARK-12776) Implement Python API for Datasets
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576739#comment-15576739 ] Michael Armbrust commented on SPARK-12776: -- I would love to see better support here, but I don't think anyone has taken the time to flesh out the API. Some suggestions I've heard are too support arbitrary objects in Datasets that are pickled and stored as {{ArrayType(BytesType)}}. If you want schema, you could also tell us that, either with a schema string (where we use the schema to extract columns out of the object) or by using something like a named tuple. This is all very rough though and I knew when we did a prototype we ran into issues caused by the batching we use when talking to the JVM (essentially things like {{df.count()}} broke). If someone wants to flesh out these proposals that would be great. > Implement Python API for Datasets > - > > Key: SPARK-12776 > URL: https://issues.apache.org/jira/browse/SPARK-12776 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Kevin Cox >Priority: Minor > > Now that the Dataset API is in Scala and Java it would be awesome to see it > show up in PySpark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12776) Implement Python API for Datasets
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557224#comment-15557224 ] holdenk commented on SPARK-12776: - Just re-opening discussion here - the migration to datasets was given as the reason to take out map from dataframes in Python, but as [~rxin] mentioned in SPARK-13233 there isn't any concrete plans to add dataset right now* (for a now of a while back). Are there any plans on the Python side for datasets (cc [~davies] & [~marmbrus])? > Implement Python API for Datasets > - > > Key: SPARK-12776 > URL: https://issues.apache.org/jira/browse/SPARK-12776 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Kevin Cox >Priority: Minor > > Now that the Dataset API is in Scala and Java it would be awesome to see it > show up in PySpark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12776) Implement Python API for Datasets
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304477#comment-15304477 ] holdenk commented on SPARK-12776: - I think this might be duplicated by SPARK-13233, although all of the PRs on that have been closed without merge. > Implement Python API for Datasets > - > > Key: SPARK-12776 > URL: https://issues.apache.org/jira/browse/SPARK-12776 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Kevin Cox >Priority: Minor > > Now that the Dataset API is in Scala and Java it would be awesome to see it > show up in PySpark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12776) Implement Python API for Datasets
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149621#comment-15149621 ] Gustavo Salazar Torres commented on SPARK-12776: I will work on some code following what was did at Dataset,scala > Implement Python API for Datasets > - > > Key: SPARK-12776 > URL: https://issues.apache.org/jira/browse/SPARK-12776 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Kevin Cox >Priority: Minor > > Now that the Dataset API is in Scala and Java it would be awesome to see it > show up in PySpark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12776) Implement Python API for Datasets
[ https://issues.apache.org/jira/browse/SPARK-12776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149525#comment-15149525 ] Gustavo Salazar Torres commented on SPARK-12776: I can work on this, any pointers? > Implement Python API for Datasets > - > > Key: SPARK-12776 > URL: https://issues.apache.org/jira/browse/SPARK-12776 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: Kevin Cox >Priority: Minor > > Now that the Dataset API is in Scala and Java it would be awesome to see it > show up in PySpark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org