[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class
[ https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304577#comment-14304577 ] Corey J. Nolet commented on SPARK-5260: --- I'm thinking all the schema-specific functions should be pulled out into an object called JsonSchemaFunctions. allKeysWithValueTypes() and createSchema() functions should be exposed via the public API and commented well based on their use. For the project I have that's using these functions, I am actually using the allKeysWithValueTypes() over my entire RDD as it's being saved to a sequence file and I'm using an Accumulator[Set[(String, DataType)]] that is aggregating all the schema elements for the RDD into a final Set where I can then store off the schema and later call CreateSchema() to get the final StructType that can be used with the sql table. I had to write a isConflicted(Set[(String, DataType)]]) function as well to determine if it's possible that a JSON object or JSON array was also encountered as a primitive type in one of the records in the RDD or vice versa. Expose JsonRDD.allKeysWithValueTypes() in a utility class -- Key: SPARK-5260 URL: https://issues.apache.org/jira/browse/SPARK-5260 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet Assignee: Corey J. Nolet I have found this method extremely useful when implementing my own strategy for inferring a schema from parsed json. For now, I've actually copied the method right out of the JsonRDD class into my own project but I think it would be immensely useful to keep the code in Spark and expose it publicly somewhere else- like an object called JsonSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class
[ https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303927#comment-14303927 ] Reynold Xin commented on SPARK-5260: BTW I've also added you to the contributor list. Expose JsonRDD.allKeysWithValueTypes() in a utility class -- Key: SPARK-5260 URL: https://issues.apache.org/jira/browse/SPARK-5260 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet I have found this method extremely useful when implementing my own strategy for inferring a schema from parsed json. For now, I've actually copied the method right out of the JsonRDD class into my own project but I think it would be immensely useful to keep the code in Spark and expose it publicly somewhere else- like an object called JsonSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class
[ https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288621#comment-14288621 ] Corey J. Nolet commented on SPARK-5260: --- May I be added to the proper list so that I can assign this ticket to myself? Expose JsonRDD.allKeysWithValueTypes() in a utility class -- Key: SPARK-5260 URL: https://issues.apache.org/jira/browse/SPARK-5260 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet I have found this method extremely useful when implementing my own strategy for inferring a schema from parsed json. For now, I've actually copied the method right out of the JsonRDD class into my own project but I think it would be immensely useful to keep the code in Spark and expose it publicly somewhere else- like an object called JsonSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class
[ https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286250#comment-14286250 ] Yin Huai commented on SPARK-5260: - [~sonixbp] Unfortunately, I failed to come up with a proper name. Will try again:) Expose JsonRDD.allKeysWithValueTypes() in a utility class -- Key: SPARK-5260 URL: https://issues.apache.org/jira/browse/SPARK-5260 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet I have found this method extremely useful when implementing my own strategy for inferring a schema from parsed json. For now, I've actually copied the method right out of the JsonRDD class into my own project but I think it would be immensely useful to keep the code in Spark and expose it publicly somewhere else- like an object called JsonSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class
[ https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280802#comment-14280802 ] Corey J. Nolet commented on SPARK-5260: --- bq. you can make the change and create a pull request. I've love to submit a pull request for this. Do you have a proposed name for the utility object? bq. We do not add fix version(s) until it has been merged into our code base. Noted, we're quite different in Accumulo- we require fix versions for each ticket. Expose JsonRDD.allKeysWithValueTypes() in a utility class -- Key: SPARK-5260 URL: https://issues.apache.org/jira/browse/SPARK-5260 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet I have found this method extremely useful when implementing my own strategy for inferring a schema from parsed json. For now, I've actually copied the method right out of the JsonRDD class into my own project but I think it would be immensely useful to keep the code in Spark and expose it publicly somewhere else- like an object called JsonSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class
[ https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279883#comment-14279883 ] Yin Huai commented on SPARK-5260: - [~sonixbp] If you like, you can make the change and create a pull request. I can help you on that. btw, just a note. We do not add fix version(s) until it has been merged into our code base. Expose JsonRDD.allKeysWithValueTypes() in a utility class -- Key: SPARK-5260 URL: https://issues.apache.org/jira/browse/SPARK-5260 Project: Spark Issue Type: Improvement Components: SQL Reporter: Corey J. Nolet Fix For: 1.3.0 I have found this method extremely useful when implementing my own strategy for inferring a schema from parsed json. For now, I've actually copied the method right out of the JsonRDD class into my own project but I think it would be immensely useful to keep the code in Spark and expose it publicly somewhere else- like an object called JsonSchema. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org