Jeff Li created SPARK-10805:
-------------------------------

             Summary: JSON Data Frame does not return correct string lengths
                 Key: SPARK-10805
                 URL: https://issues.apache.org/jira/browse/SPARK-10805
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.4.1
            Reporter: Jeff Li
            Priority: Critical


Here is the sample code to run the test 

@Test
  public void runSchemaTest() throws Exception {
                DataFrame jsonDataFrame = 
sqlContext.jsonFile("src/test/resources/jsontransform/json.sampledata.json");
                jsonDataFrame.printSchema();

                StructType jsonSchema = jsonDataFrame.schema();
                StructField[] dataFields = jsonSchema.fields();
                for ( int fieldIndex = 0; fieldIndex < dataFields.length;  
fieldIndex++) {
                        StructField aField = dataFields[fieldIndex];
                        DataType aType = aField.dataType();
                        System.out.println("name: " + aField.name() + " type: " 
+ aType.typeName()
                                        + " size: " +aType.defaultSize());
                }
 }

name: _id type: string size: 4096
name: firstName type: string size: 4096
name: lastName type: string size: 4096

In my case, the _id: 1 character, first name: 4 characters, and last name: 7 
characters). 

The Spark JSON Data frame should have a way to tell the maximum length of each 
JSON String elements in the JSON document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to