[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class

2015-02-03 Thread Corey J. Nolet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304577#comment-14304577
 ] 

Corey J. Nolet commented on SPARK-5260:
---

I'm thinking all the schema-specific functions should be pulled out into an 
object called JsonSchemaFunctions. allKeysWithValueTypes() and createSchema() 
functions should be exposed via the public API and commented well based on 
their use. 

For the project I have that's using these functions, I am actually using the 
allKeysWithValueTypes() over my entire RDD as it's being saved to a sequence 
file and I'm using an Accumulator[Set[(String, DataType)]] that is aggregating 
all the schema elements for the RDD into a final Set where I can then store off 
the schema and later call CreateSchema() to get the final StructType that can 
be used with the sql table. I had to write a isConflicted(Set[(String, 
DataType)]]) function as well to determine if it's possible that a JSON object 
or JSON array was also encountered as a primitive type in one of the records in 
the RDD or vice versa.

 Expose JsonRDD.allKeysWithValueTypes() in a utility class 
 --

 Key: SPARK-5260
 URL: https://issues.apache.org/jira/browse/SPARK-5260
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet
Assignee: Corey J. Nolet

 I have found this method extremely useful when implementing my own strategy 
 for inferring a schema from parsed json. For now, I've actually copied the 
 method right out of the JsonRDD class into my own project but I think it 
 would be immensely useful to keep the code in Spark and expose it publicly 
 somewhere else- like an object called JsonSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class

2015-02-03 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303927#comment-14303927
 ] 

Reynold Xin commented on SPARK-5260:


BTW I've also added you to the contributor list.

 Expose JsonRDD.allKeysWithValueTypes() in a utility class 
 --

 Key: SPARK-5260
 URL: https://issues.apache.org/jira/browse/SPARK-5260
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet

 I have found this method extremely useful when implementing my own strategy 
 for inferring a schema from parsed json. For now, I've actually copied the 
 method right out of the JsonRDD class into my own project but I think it 
 would be immensely useful to keep the code in Spark and expose it publicly 
 somewhere else- like an object called JsonSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class

2015-01-22 Thread Corey J. Nolet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288621#comment-14288621
 ] 

Corey J. Nolet commented on SPARK-5260:
---

May I be added to the proper list so that I can assign this ticket to myself?

 Expose JsonRDD.allKeysWithValueTypes() in a utility class 
 --

 Key: SPARK-5260
 URL: https://issues.apache.org/jira/browse/SPARK-5260
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet

 I have found this method extremely useful when implementing my own strategy 
 for inferring a schema from parsed json. For now, I've actually copied the 
 method right out of the JsonRDD class into my own project but I think it 
 would be immensely useful to keep the code in Spark and expose it publicly 
 somewhere else- like an object called JsonSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class

2015-01-21 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286250#comment-14286250
 ] 

Yin Huai commented on SPARK-5260:
-

[~sonixbp] Unfortunately, I failed to come up with a proper name. Will try 
again:)

 Expose JsonRDD.allKeysWithValueTypes() in a utility class 
 --

 Key: SPARK-5260
 URL: https://issues.apache.org/jira/browse/SPARK-5260
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet

 I have found this method extremely useful when implementing my own strategy 
 for inferring a schema from parsed json. For now, I've actually copied the 
 method right out of the JsonRDD class into my own project but I think it 
 would be immensely useful to keep the code in Spark and expose it publicly 
 somewhere else- like an object called JsonSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class

2015-01-16 Thread Corey J. Nolet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280802#comment-14280802
 ] 

Corey J. Nolet commented on SPARK-5260:
---

bq.  you can make the change and create a pull request.

I've love to submit a pull request for this. Do you have a proposed name for 
the utility object?

bq. We do not add fix version(s) until it has been merged into our code base.

Noted, we're quite different in Accumulo- we require fix versions for each 
ticket.

 Expose JsonRDD.allKeysWithValueTypes() in a utility class 
 --

 Key: SPARK-5260
 URL: https://issues.apache.org/jira/browse/SPARK-5260
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet

 I have found this method extremely useful when implementing my own strategy 
 for inferring a schema from parsed json. For now, I've actually copied the 
 method right out of the JsonRDD class into my own project but I think it 
 would be immensely useful to keep the code in Spark and expose it publicly 
 somewhere else- like an object called JsonSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5260) Expose JsonRDD.allKeysWithValueTypes() in a utility class

2015-01-15 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279883#comment-14279883
 ] 

Yin Huai commented on SPARK-5260:
-

[~sonixbp] If you like, you can make the change and create a pull request. I 
can help you on that.

btw, just a note. We do not add fix version(s) until it has been merged into 
our code base.

 Expose JsonRDD.allKeysWithValueTypes() in a utility class 
 --

 Key: SPARK-5260
 URL: https://issues.apache.org/jira/browse/SPARK-5260
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet
 Fix For: 1.3.0


 I have found this method extremely useful when implementing my own strategy 
 for inferring a schema from parsed json. For now, I've actually copied the 
 method right out of the JsonRDD class into my own project but I think it 
 would be immensely useful to keep the code in Spark and expose it publicly 
 somewhere else- like an object called JsonSchema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org