[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151354377
  
@stephend-realitymine Can you ping me in the jira 
(https://issues.apache.org/jira/browse/SPARK-10947)? So, we can assign it to 
you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151354173
  
LGTM. Thank you for working on it. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151223738
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151223534
  
**[Test build #44357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44357/consoleFull)**
 for PR 9249 at commit 
[`0f47f1e`](https://github.com/apache/spark/commit/0f47f1e8bb2500e5a12717904079e5001754f6df).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151223741
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44357/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151175380
  
**[Test build #44357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44357/consoleFull)**
 for PR 9249 at commit 
[`0f47f1e`](https://github.com/apache/spark/commit/0f47f1e8bb2500e5a12717904079e5001754f6df).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151171932
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151171981
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151163276
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44356/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151163274
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151163267
  
**[Test build #44356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44356/consoleFull)**
 for PR 9249 at commit 
[`70666e5`](https://github.com/apache/spark/commit/70666e5b471ed258c599dd93b6375f44b7cddf50).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151162581
  
**[Test build #44356 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44356/consoleFull)**
 for PR 9249 at commit 
[`70666e5`](https://github.com/apache/spark/commit/70666e5b471ed258c599dd93b6375f44b7cddf50).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151160168
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-151160221
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-26 Thread stephend-realitymine
Github user stephend-realitymine commented on a diff in the pull request:

https://github.com/apache/spark/pull/9249#discussion_r42979119
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -632,6 +632,39 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 )
   }
 
+  test("Loading a JSON dataset primitivesAsString returns schema with 
primitive types as strings") {
+val dir = Utils.createTempDir()
+dir.delete()
+val path = dir.getCanonicalPath
+primitiveFieldAndType.map(record => record.replaceAll("\n", " 
")).saveAsTextFile(path)
+val jsonDF = sqlContext.read.option("primitivesAsString", 
"true").json(path)
+
+val expectedSchema = StructType(
+  StructField("bigInteger", DecimalType(20, 0), true) ::
+  StructField("boolean", BooleanType, true) ::
+  StructField("double", DoubleType, true) ::
+  StructField("integer", LongType, true) ::
+  StructField("long", LongType, true) ::
+  StructField("null", StringType, true) ::
+  StructField("string", StringType, true) :: Nil)
--- End diff --

Yes you are correct this should be ```StringType``` I will also add the 
test for complex types as you have suggested. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150645771
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44236/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150645768
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150645692
  
**[Test build #44236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44236/consoleFull)**
 for PR 9249 at commit 
[`18d2861`](https://github.com/apache/spark/commit/18d28619264dbaf10f1e27576f5c4275cbc4ef72).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150640384
  
**[Test build #44236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44236/consoleFull)**
 for PR 9249 at commit 
[`18d2861`](https://github.com/apache/spark/commit/18d28619264dbaf10f1e27576f5c4275cbc4ef72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150639887
  
Thank you @stephend-realitymine for working on it! Overall, the change in 
infer schema looks good. I left a comment at the test part. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150639689
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150639713
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150639251
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9249#discussion_r42890747
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala
 ---
@@ -103,11 +107,14 @@ private[sql] object InferSchema {
 // the type as we pass through all JSON objects.
 var elementType: DataType = NullType
 while (nextUntil(parser, END_ARRAY)) {
-  elementType = compatibleType(elementType, inferField(parser))
+  elementType = compatibleType(elementType, inferField(parser, 
primitivesAsString))
 }
 
 ArrayType(elementType)
 
+  case (VALUE_NUMBER_INT | VALUE_NUMBER_FLOAT) if primitivesAsString 
=> StringType
+  case (VALUE_TRUE | VALUE_FALSE) if primitivesAsString => StringType
--- End diff --

Add a newline between these two cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9249#discussion_r42890562
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -1262,4 +1299,4 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   )
 }
   }
-}
+}
--- End diff --

Add a newline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9249#discussion_r42890545
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -632,6 +632,39 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 )
   }
 
+  test("Loading a JSON dataset primitivesAsString returns schema with 
primitive types as strings") {
+val dir = Utils.createTempDir()
+dir.delete()
+val path = dir.getCanonicalPath
+primitiveFieldAndType.map(record => record.replaceAll("\n", " 
")).saveAsTextFile(path)
+val jsonDF = sqlContext.read.option("primitivesAsString", 
"true").json(path)
+
+val expectedSchema = StructType(
+  StructField("bigInteger", DecimalType(20, 0), true) ::
+  StructField("boolean", BooleanType, true) ::
+  StructField("double", DoubleType, true) ::
+  StructField("integer", LongType, true) ::
+  StructField("long", LongType, true) ::
+  StructField("null", StringType, true) ::
+  StructField("string", StringType, true) :: Nil)
--- End diff --

Looks like we need to change all of these data types to `StringType`, right?

Also, can you add a test with complex types (`StructType` and `ArrayType`) 
to make sure we preserve the structure?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9249#issuecomment-150537791
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread stephend-realitymine
GitHub user stephend-realitymine opened a pull request:

https://github.com/apache/spark/pull/9249

[SPARK-10947] [SQL] With schema inference from JSON into a Dataframe, add 
option to infer all primitive object types as strings

Currently, when a schema is inferred from a JSON file using 
sqlContext.read.json, the primitive object types are inferred as string, long, 
boolean, etc.

However, if the inferred type is too specific (JSON obviously does not 
enforce types itself), this can cause issues with merging dataframe schemas.

This pull request adds the option "primitivesAsString" to the JSON 
DataFrameReader which when true (defaults to false if not set) will infer all 
primitives as strings.

Below is an example usage of this new functionality.
```
val jsonDf = sqlContext.read.option("primitivesAsString", 
"true").json(sampleJsonFile)

scala> jsonDf.printSchema()
root
|-- bigInteger: string (nullable = true)
|-- boolean: string (nullable = true)
|-- double: string (nullable = true)
|-- integer: string (nullable = true)
|-- long: string (nullable = true)
|-- null: string (nullable = true)
|-- string: string (nullable = true)
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/RealityMineLtd/spark stephend-primitives

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9249


commit a718b8658cda4d849c9057dc9cd601bd6d31503e
Author: Stephen De Gennaro 
Date:   2015-10-23T09:03:22Z

SPARK-10947 Added option to json schema primitivesAsString when true will 
infer primative types as strings

commit 9e6411425546400ae94fd67dc143d2b68c6243aa
Author: Stephen De Gennaro 
Date:   2015-10-23T09:32:38Z

SPARK-10947 adding missed bracket

commit 3989c6aa33acb0af3e151fcd26737cb295de550e
Author: Stephen De Gennaro 
Date:   2015-10-23T09:59:06Z

SPARK-10947 removing duplicate line

commit 18d28619264dbaf10f1e27576f5c4275cbc4ef72
Author: Stephen De Gennaro 
Date:   2015-10-23T10:01:30Z

SPARK-10947 removing extra bracket




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread stephend-realitymine
Github user stephend-realitymine commented on the pull request:

https://github.com/apache/spark/pull/9245#issuecomment-150523833
  
Going to close this pull request and recreate from rebased master. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread stephend-realitymine
Github user stephend-realitymine closed the pull request at:

https://github.com/apache/spark/pull/9245


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9245#issuecomment-150506040
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10947] [SQL] With schema inference from...

2015-10-23 Thread stephend-realitymine
GitHub user stephend-realitymine opened a pull request:

https://github.com/apache/spark/pull/9245

[SPARK-10947] [SQL] With schema inference from JSON into a Dataframe, add 
option to infer all primitive object types as strings


Currently, when a schema is inferred from a JSON file using 
sqlContext.read.json, the primitive object types are inferred as string, long, 
boolean, etc.

However, if the inferred type is too specific (JSON obviously does not 
enforce types itself), this can cause issues with merging dataframe schemas.

This pull request adds the option "primitivesAsString" to the JSON 
DataFrameReader which when true (defaults to false if not set) will infer all 
primitives as strings.

Below is an example usage of this new functionality.
```
val jsonDf = sqlContext.read.option("primitivesAsString", 
"true").json(primitiveFieldAndType)

scala> jsonDf.printSchema()
root
|-- bigInteger: string (nullable = true)
|-- boolean: string (nullable = true)
|-- double: string (nullable = true)
|-- integer: string (nullable = true)
|-- long: string (nullable = true)
|-- null: string (nullable = true)
|-- string: string (nullable = true)
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/RealityMineLtd/spark 
stephend-primitivesAsString

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9245.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9245


commit 79b68a886a3a6e324e682709af346e599b3efd57
Author: RealityMine Ltd Coordinator 

Date:   2015-10-06T11:16:13Z

Merge pull request #1 from apache/master

Resyncing to master

commit ec74be6f92ef4940bacb9455989c473ed3a1539a
Author: Stephen De Gennaro 
Date:   2015-10-15T12:19:34Z

SPARK-10947 Added option to json schema primativesAsString when true will 
infer primative types as strings

commit 8a879c80a87e1bf9fe17fd58b18bce36294bc17b
Author: Ewan Leith 
Date:   2015-10-22T16:07:39Z

Fixing spelling of primitive from primative




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org