Jeff Li created SPARK-10805: ------------------------------- Summary: JSON Data Frame does not return correct string lengths Key: SPARK-10805 URL: https://issues.apache.org/jira/browse/SPARK-10805 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.4.1 Reporter: Jeff Li Priority: Critical
Here is the sample code to run the test @Test public void runSchemaTest() throws Exception { DataFrame jsonDataFrame = sqlContext.jsonFile("src/test/resources/jsontransform/json.sampledata.json"); jsonDataFrame.printSchema(); StructType jsonSchema = jsonDataFrame.schema(); StructField[] dataFields = jsonSchema.fields(); for ( int fieldIndex = 0; fieldIndex < dataFields.length; fieldIndex++) { StructField aField = dataFields[fieldIndex]; DataType aType = aField.dataType(); System.out.println("name: " + aField.name() + " type: " + aType.typeName() + " size: " +aType.defaultSize()); } } name: _id type: string size: 4096 name: firstName type: string size: 4096 name: lastName type: string size: 4096 In my case, the _id: 1 character, first name: 4 characters, and last name: 7 characters). The Spark JSON Data frame should have a way to tell the maximum length of each JSON String elements in the JSON document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org