date:20151115

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/9597#issuecomment-156794866
  
Hi @koeninger , how about this change? Still keeping the mapping relations, 
so offset range can be retrieved through partitionId, just filter out empty 
partition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-15 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9718#discussion_r44868778
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -267,6 +267,55 @@ class CodeGenContext {
 case dt: DataType if isPrimitiveType(dt) => s"($c1 > $c2 ? 1 : $c1 < 
$c2 ? -1 : 0)"
 case BinaryType => 
s"org.apache.spark.sql.catalyst.util.TypeUtils.compareBinary($c1, $c2)"
 case NullType => "0"
+case array: ArrayType =>
+  val elementType = array.elementType
+  val elementA = freshName("elementA")
+  val isNullA = freshName("isNullA")
+  val elementB = freshName("elementB")
+  val isNullB = freshName("isNullB")
+  val compareFunc = freshName("compareArray")
+  val i = freshName("i")
+  val minLength = freshName("minLength")
+  val funcCode: String =
+s"""
+  public int $compareFunc(ArrayData a, ArrayData b) {
+int lengthA = a.numElements();
+int lengthB = b.numElements();
+int $minLength = (lengthA > lengthB) ? lengthB : lengthA;
+boolean $isNullA;
+boolean $isNullB;
+${javaType(elementType)} $elementA;
+${javaType(elementType)} $elementB;
--- End diff --

These could be defined in the loop (let compiler to optimize them easily)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-15 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9718#discussion_r44868784
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -267,6 +267,55 @@ class CodeGenContext {
 case dt: DataType if isPrimitiveType(dt) => s"($c1 > $c2 ? 1 : $c1 < 
$c2 ? -1 : 0)"
 case BinaryType => 
s"org.apache.spark.sql.catalyst.util.TypeUtils.compareBinary($c1, $c2)"
 case NullType => "0"
+case array: ArrayType =>
+  val elementType = array.elementType
+  val elementA = freshName("elementA")
+  val isNullA = freshName("isNullA")
+  val elementB = freshName("elementB")
+  val isNullB = freshName("isNullB")
+  val compareFunc = freshName("compareArray")
+  val i = freshName("i")
+  val minLength = freshName("minLength")
+  val funcCode: String =
+s"""
+  public int $compareFunc(ArrayData a, ArrayData b) {
+int lengthA = a.numElements();
+int lengthB = b.numElements();
+int $minLength = (lengthA > lengthB) ? lengthB : lengthA;
+boolean $isNullA;
+boolean $isNullB;
+${javaType(elementType)} $elementA;
+${javaType(elementType)} $elementB;
+for (int $i = 0; $i < $minLength; $i++) {
--- End diff --

`i` should be enough here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-15 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/9718#issuecomment-156789726
  
LGTM, and some minor comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9720#issuecomment-156789931
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

2015-11-15 Thread kevinyu98

GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/9720

[SPARK-11447][SQL] change NullType to StringType during binaryComparison 
between NullType and StringType

During executing PromoteStrings rule, if one side of binaryComparison is 
StringType and the other side is not StringType, the current code will 
promote(cast) the StringType to DoubleType, and if the StringType doesn't 
contain the numbers, it will get null value. So if it is doing <=> (NULL-safe 
equal) with Null, it will not filter anything, caused the problem reported by 
this jira.

I proposal to the changes through this PR, can you review my code changes ? 

This problem only happen for <=>, other operators works fine.

scala> val filteredDF = df.filter(df("column") > (new 
Column(Literal(null
filteredDF: org.apache.spark.sql.DataFrame = [column: string]

scala> filteredDF.show
+--+
|column|
+--+
+--+

scala> val filteredDF = df.filter(df("column") === (new 
Column(Literal(null
filteredDF: org.apache.spark.sql.DataFrame = [column: string]

scala> filteredDF.show
+--+
|column|
+--+
+--+

scala> df.registerTempTable("DF")

scala> sqlContext.sql("select * from DF where 'column' = NULL")
res27: org.apache.spark.sql.DataFrame = [column: string]

scala> res27.show
+--+
|column|
+--+
+--+

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark working_on_spark-11447

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9720


commit b53b85cad4f5fced9ba003351d5a9af1eb5111fc
Author: Kevin Yu 
Date:   2015-11-13T18:11:59Z

[SPARK-11447]Check NullType before Promote StringType

commit bb705cae18032fcee8f8a532be464f0a995b27cb
Author: Kevin Yu 
Date:   2015-11-15T06:41:48Z

add testcase in ColumnExpressionSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

2015-11-15 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/9684#issuecomment-156793955
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11743][SQL] Add UserDefinedType support...

2015-11-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/9712#discussion_r44871929
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala
 ---
@@ -68,7 +117,36 @@ class RowEncoderSuite extends SparkFunSuite {
   .add("structOfArray", new StructType().add("array", arrayOfString))
   .add("structOfMap", new StructType().add("map", mapOfString))
   .add("structOfArrayAndMap",
-new StructType().add("array", arrayOfString).add("map", 
mapOfString)))
+new StructType().add("array", arrayOfString).add("map", 
mapOfString))
+  .add("structOfUDT", structOfUDT))
+
+  test(s"encode/decode: arrayOfUDT") {
+val schema = new StructType()
+  .add("arrayOfUDT", arrayOfUDT)
+
+val encoder = RowEncoder(schema)
+
+val input: Row = Row(Seq(new ExamplePoint(0.1, 0.2), new 
ExamplePoint(0.3, 0.4)))
+val row = encoder.toRow(input)
+val convertedBack = encoder.fromRow(row)
+assert(input.getSeq[ExamplePoint](0) == 
convertedBack.getSeq[ExamplePoint](0))
+  }
+
+  test(s"encode/decode: Product") {
+val schema = new StructType()
+  .add("structAsProduct",
+new StructType()
+  .add("int", IntegerType)
+  .add("string", StringType)
+  .add("double", DoubleType))
+
+val encoder = RowEncoder(schema)
+
+val input: Row = Row((100, "test", 0.123))
--- End diff --

If one of the input parameter is `Tuple2`, then we need to use the encoder 
to decode a catalyst value to external value, i.e. decode an `InternalRow` 
object to `Tuple2` object. I think this is hard for a `RowEncoder`(your change 
only makes it possible to encode a `Product` into `InternalRow`, but not vice 
versa), we should use `ProductEncoder` for this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11689] [ML] Add user guide and example ...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9722#issuecomment-156814081
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-156814877
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45952/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-156816289
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-156816290
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45953/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11743][SQL] Add UserDefinedType support...

2015-11-15 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/9712#discussion_r44872999
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala
 ---
@@ -68,7 +117,36 @@ class RowEncoderSuite extends SparkFunSuite {
   .add("structOfArray", new StructType().add("array", arrayOfString))
   .add("structOfMap", new StructType().add("map", mapOfString))
   .add("structOfArrayAndMap",
-new StructType().add("array", arrayOfString).add("map", 
mapOfString)))
+new StructType().add("array", arrayOfString).add("map", 
mapOfString))
+  .add("structOfUDT", structOfUDT))
+
+  test(s"encode/decode: arrayOfUDT") {
+val schema = new StructType()
+  .add("arrayOfUDT", arrayOfUDT)
+
+val encoder = RowEncoder(schema)
+
+val input: Row = Row(Seq(new ExamplePoint(0.1, 0.2), new 
ExamplePoint(0.3, 0.4)))
+val row = encoder.toRow(input)
+val convertedBack = encoder.fromRow(row)
+assert(input.getSeq[ExamplePoint](0) == 
convertedBack.getSeq[ExamplePoint](0))
+  }
+
+  test(s"encode/decode: Product") {
+val schema = new StructType()
+  .add("structAsProduct",
+new StructType()
+  .add("int", IntegerType)
+  .add("string", StringType)
+  .add("double", DoubleType))
+
+val encoder = RowEncoder(schema)
+
+val input: Row = Row((100, "test", 0.123))
--- End diff --

If we have an input parameter mapping to a `StructType` field in an 
`InternalRow`, we will use `Row` as its input type. E.g., 
`sqlContext.udf.register("udfFunc", (ns: Row) => { (ns.getInt(0), 
ns.getString(1)) })`. But we can't use `Row` as output type for an UDF. Because 
we can still get the input schema of `ScalaUDF`'s children expressions later if 
we can't infer input types correctly by using `schemaFor`. However, the output 
types of the UDF can be only inferred by `schemaFor`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-156816256
  
**[Test build #45953 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45953/consoleFull)**
 for PR 9099 at commit 
[`5e9580c`](https://github.com/apache/spark/commit/5e9580cb03b1e28b790deb099a443c64fbcae9a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `  # 
class of POSIXlt is c(\"POSIXlt\" \"POSIXt\")`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11689] [ML] Add user guide and example ...

2015-11-15 Thread hhbyyh

GitHub user hhbyyh opened a pull request:

https://github.com/apache/spark/pull/9722

[SPARK-11689] [ML] Add user guide and example code for LDA under spark.ml

jira: https://issues.apache.org/jira/browse/SPARK-11689

Add simple user guide for LDA under spark.ml and example code under 
examples/. Use include_example to include example code in the user guide 
markdown. Check SPARK-11606 for instructions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark ldaMLExample

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9722.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9722


commit aeea7909c60af8883590fceba687a7dc3b98cd8f
Author: Yuhao Yang 
Date:   2015-11-15T13:31:37Z

ml lda example and doc

commit 09b59de953101d6dba5023033af2a2bb2ea5385f
Author: Yuhao Yang 
Date:   2015-11-15T13:42:41Z

add link to new doc

commit 8a6d2d61bf31e653b4ddd5f05b7c84b1577c9694
Author: Yuhao Yang 
Date:   2015-11-15T13:45:17Z

doc fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11689] [ML] Add user guide and example ...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9722#issuecomment-156812972
  
**[Test build #45951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45951/consoleFull)**
 for PR 9722 at commit 
[`8a6d2d6`](https://github.com/apache/spark/commit/8a6d2d61bf31e653b4ddd5f05b7c84b1577c9694).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11689] [ML] Add user guide and example ...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9722#issuecomment-156814056
  
**[Test build #45951 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45951/consoleFull)**
 for PR 9722 at commit 
[`8a6d2d6`](https://github.com/apache/spark/commit/8a6d2d61bf31e653b4ddd5f05b7c84b1577c9694).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`public class JavaLDAExample `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11689] [ML] Add user guide and example ...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9722#issuecomment-156814083
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45951/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-156814876
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-156814873
  
**[Test build #45952 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45952/consoleFull)**
 for PR 9099 at commit 
[`69fa917`](https://github.com/apache/spark/commit/69fa917292acd1f7c76d2c31201839af9aca54c6).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `  # 
class of POSIXlt is c(\"POSIXlt\" \"POSIXt\")`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11086][SPARKR] Use dropFactors column-w...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9099#issuecomment-156815446
  
**[Test build #45953 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45953/consoleFull)**
 for PR 9099 at commit 
[`5e9580c`](https://github.com/apache/spark/commit/5e9580cb03b1e28b790deb099a443c64fbcae9a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11744][Launcher] Fix print version thro...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9721#issuecomment-156815860
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11744][Launcher] Fix print version thro...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9721#issuecomment-156815861
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45950/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11744][Launcher] Fix print version thro...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9721#issuecomment-156815828
  
**[Test build #45950 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45950/consoleFull)**
 for PR 9721 at commit 
[`7f90c60`](https://github.com/apache/spark/commit/7f90c60c7930d0802eb7465ed34a94f0b71b890f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11191] [SPARK-11311] [SQL] Backports #9...

2015-11-15 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9671#issuecomment-156819436
  
cc @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11191] [SQL] Looks up temporary functio...

2015-11-15 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/9664#discussion_r44873198
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -454,7 +454,7 @@ class HiveContext private[hive](
   // Note that HiveUDFs will be overridden by functions registered in this 
context.
   @transient
   override protected[sql] lazy val functionRegistry: FunctionRegistry =
-new HiveFunctionRegistry(FunctionRegistry.builtin.copy()) {
+new HiveFunctionRegistry(FunctionRegistry.builtin.copy(), this) {
--- End diff --

Thanks for pointing this out. At first I didn't notice this part either. 
Just reading the code, I'd assume that this already fixes the issue. But it 
wasn't the case.

After some investigation, I'm quite puzzled by the behavior here. Without 
this PR, we can add a jar, create a UDTF from the jar, and apply this UDTF in 
SQL queries successfully. However, `DESCRIBE FUNCTION` still returns "Function: 
 is not found". I tried single-step debugging `DescribeFunction` 
and noticed that the `sqlContext.functionRegistry.lookupFunction` call goes 
directly to `HiveFunctionRegistry.lookupFunction` without calling the overriden 
version defined in this anonymous class.

Anyway, now we can remove this anonymous class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread koeninger

Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/9597#issuecomment-156822484

Are you 100% sure that all uses of the partition array only use the index
associated with the individual Partition, and not its position in the array?

At the very least, you should have a test for an RDD with partially empty
partitions and ensure that the indices for hasoffsetranges line up with the
task context partition id.

On Sun, Nov 15, 2015 at 3:48 AM, Saisai Shao 
wrote:

> Hi @koeninger  , how about this change?
> Still keeping the mapping relations, so offset range can be retrieved
> through partitionId, just filter out empty partition.
>
> â
> Reply to this email directly or view it on GitHub
> .
>

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9565#issuecomment-156822427
  
**[Test build #45954 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45954/consoleFull)**
 for PR 9565 at commit 
[`5c18c0c`](https://github.com/apache/spark/commit/5c18c0c6a3b1d65ee7ed81f54f816796db63d394).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-15 Thread zhonghaihua

Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-156810307
  
Hi @zhichao-li ,thanks for doing this.I got a problem of scanning 
partitions slowly,and I apply this patch to my spark version.In my case:
 * Before I apply this patch,it takes at least 3 or 4 minutes to scan 
partitions.
 * After applying this patch,it takes only about 20 seconds at this stage.
I am happy to see it takes effect in my case.It solve my problem.And I 
think is it better to add conf to control whether to use this featureï¼


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11744][Launcher] Fix print version thro...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9721#issuecomment-156810256
  
**[Test build #45950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45950/consoleFull)**
 for PR 9721 at commit 
[`7f90c60`](https://github.com/apache/spark/commit/7f90c60c7930d0802eb7465ed34a94f0b71b890f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...

2015-11-15 Thread zhonghaihua

Github user zhonghaihua commented on the pull request:

https://github.com/apache/spark/pull/9483#issuecomment-156810714
  
Hi @zhichao-li ,thanks for doing this.I got a problem of scanning 
partitions slowly,and I apply this patch to my spark version.In my case:
* Before I apply this patch,it takes at least 3 or 4 minutes to scan 
partitions.
* After applying this patch,it takes only about 20 seconds at this stage.

I am happy to see it takes effect in my case.It solve my problem.And I 
think is it better to add conf to control whether to use this featureï¼


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156810989
  
**[Test build #2060 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2060/consoleFull)**
 for PR 9264 at commit 
[`c19b3c0`](https://github.com/apache/spark/commit/c19b3c084d0c870a422df6e32f8efbe7620d335c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-15 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156801643
  
@reggert yeah, with thousands of tests, some of them integration-style 
tests, flakiness is a pretty regular occurrence. You can see a number of PRs to 
improve indivudal ones. We can just retest. Don't worry about fixing tests here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156802784
  
**[Test build #2060 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2060/consoleFull)**
 for PR 9264 at commit 
[`c19b3c0`](https://github.com/apache/spark/commit/c19b3c084d0c870a422df6e32f8efbe7620d335c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9684#issuecomment-156802883
  
**[Test build #45948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45948/consoleFull)**
 for PR 9684 at commit 
[`5347734`](https://github.com/apache/spark/commit/53477342d9fadc9f1da365805b81688f9b8ee5bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9597#issuecomment-156806763
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45949/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9597#issuecomment-156806762
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9684#issuecomment-15686
  
**[Test build #45948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45948/consoleFull)**
 for PR 9684 at commit 
[`5347734`](https://github.com/apache/spark/commit/53477342d9fadc9f1da365805b81688f9b8ee5bd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9597#issuecomment-156803004
  
**[Test build #45949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45949/consoleFull)**
 for PR 9597 at commit 
[`30e1578`](https://github.com/apache/spark/commit/30e1578dcb2d690a5d0d48b7e6f1a7463aedc158).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11672] [ML] set active SQLContext in Ja...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9719#issuecomment-156802990
  
**[Test build #45947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45947/consoleFull)**
 for PR 9719 at commit 
[`1889a37`](https://github.com/apache/spark/commit/1889a374ed4cd3cdf4cd889d61fffc6f78a11d2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9597#issuecomment-156806725
  
**[Test build #45949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45949/consoleFull)**
 for PR 9597 at commit 
[`30e1578`](https://github.com/apache/spark/commit/30e1578dcb2d690a5d0d48b7e6f1a7463aedc158).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11672] [ML] set active SQLContext in Ja...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9719#issuecomment-156807487
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11672] [ML] set active SQLContext in Ja...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9719#issuecomment-156807488
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45947/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11672] [ML] set active SQLContext in Ja...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9719#issuecomment-156807462
  
**[Test build #45947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45947/consoleFull)**
 for PR 9719 at commit 
[`1889a37`](https://github.com/apache/spark/commit/1889a374ed4cd3cdf4cd889d61fffc6f78a11d2a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9684#issuecomment-156811167
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45948/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11744][Launcher] Fix print version thro...

2015-11-15 Thread jerryshao

GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/9721

[SPARK-11744][Launcher] Fix print version throw exception when using 
pyspark shell

Exception details can be seen here 
(https://issues.apache.org/jira/browse/SPARK-11744).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-11744

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9721


commit 7f90c60c7930d0802eb7465ed34a94f0b71b890f
Author: jerryshao 
Date:   2015-11-15T12:24:54Z

Fix print version exception on python shell




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-15 Thread xwu0226

Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156845293
  
Accidentially pushed another JIRA's code together. . I am backing it out


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6541 - Sort executors by ID (numeric)

2015-11-15 Thread jbonofre

Github user jbonofre commented on the pull request:

https://github.com/apache/spark/pull/9165#issuecomment-156845898
  
PR rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6541 - Sort executors by ID (numeric)

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9165#issuecomment-156846400
  
**[Test build #45957 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45957/consoleFull)**
 for PR 9165 at commit 
[`941db75`](https://github.com/apache/spark/commit/941db75ea21571b729ec6e35ee8b8b03190840f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8872#issuecomment-156846731
  
**[Test build #45958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45958/consoleFull)**
 for PR 8872 at commit 
[`7a16052`](https://github.com/apache/spark/commit/7a16052478d6f2723c57e21cb29748bce7bf25e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Exit AsynchronousListenerBus thr...

2015-11-15 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/9546#issuecomment-156846877
  
I think that this has caused the  
"org.apache.spark.scheduler.EventLoggingListenerSuite.End-to-end event logging" 
test to become flaky in Jenkins.

I believe that this patch may have changed the behavior of the listener bus 
during shutdown. According to the `stop()` method's Scaladoc:

```
  /**
   * Stop the listener bus. It will wait until the queued events have been 
processed, but drop the
   * new events after stopping.
   */
```

It looks like this patch just changes things so that we halt immediately 
once the `stopped` flag has been set rather than waiting for the queue to drain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11734][SQL] Rename TungstenProject -> P...

2015-11-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9700#issuecomment-156841942
  
Going to merge this first. I will submit followup prs if there are any 
posthoc feedback.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156842519
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-15 Thread reggert

Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156844499
  
@srowen Okay. I was just worried I was going to get blamed for breaking 
something. ;-)

What's left to do here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Exit AsynchronousListenerBus thr...

2015-11-15 Thread ted-yu

Github user ted-yu commented on the pull request:

https://github.com/apache/spark/pull/9546#issuecomment-156850070
  
I checked 

https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/4122/consoleFull
back till:

https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/4118/consoleFull


https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.3,label=spark-test/4122/consoleFull
back until:

https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.3,label=spark-test/4119/consoleFull

EventLoggingListenerSuite passed in every build above



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156850658
  
**[Test build #45959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45959/consoleFull)**
 for PR 9542 at commit 
[`4481c82`](https://github.com/apache/spark/commit/4481c82a98af62cc4d46d2f07c4d728236bf6d83).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Exit AsynchronousListenerBus thr...

2015-11-15 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/9546#issuecomment-156851581
  
Look at the Master SBT build; there's definitely a regression: 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/4014/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/testReport/junit/org.apache.spark.scheduler/EventLoggingListenerSuite/End_to_end_event_logging/history/

If you keep clicking on the "Older" link to page back through the test 
history, you'll find that this first started in 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/3982/testReport/,
 whose changeset includes this patch: 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=spark-test/3982/changes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

2015-11-15 Thread selvinsource

Github user selvinsource commented on the pull request:

https://github.com/apache/spark/pull/9057#issuecomment-156852089
  
@yinxusen 

https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
I tested both multinomial and bernoulli.
The bernoulli results are good, I used the SPEC Heart dataset.
The multinomial results are not as good, the scores in jpmml differ from 
the spark predict, this confirms your worries.

We could start supporting only Bernoulli and throw a 
IllegalArgumentException for Multinomial in PMMLModelExportFactory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-15 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156853749
  
LGTM pending jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11191] [SPARK-11311] [SQL] Backports #9...

2015-11-15 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9671#issuecomment-156854295
  
Merge to branch 1.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11191] [SPARK-11311] [SQL] Backports #9...

2015-11-15 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9671#issuecomment-156854870
  
Merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11191] [SQL] Looks up temporary functio...

2015-11-15 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9664#discussion_r44877679
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -454,7 +454,7 @@ class HiveContext private[hive](
   // Note that HiveUDFs will be overridden by functions registered in this 
context.
   @transient
   override protected[sql] lazy val functionRegistry: FunctionRegistry =
-new HiveFunctionRegistry(FunctionRegistry.builtin.copy()) {
+new HiveFunctionRegistry(FunctionRegistry.builtin.copy(), this) {
--- End diff --

Can we have a PR to remove this anonymous class from master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11672] [ML] set active SQLContext in Ja...

2015-11-15 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9719#issuecomment-156855129
  
LGTM. Merge to branch 1.6 and master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11672] [ML] set active SQLContext in Ja...

2015-11-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9719


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10181][SQL] Do kerberos login for crede...

2015-11-15 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9272#issuecomment-156866902
  
Thanks! Merging to master and branch 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-15 Thread reggert

Github user reggert commented on the pull request:

https://github.com/apache/spark/pull/9264#issuecomment-156868899
  
I'm not particularly happy about the (internal) API exposed by 
`ComplexFutureAction`. Having callers instantiate the action and then mutate it 
by calling the `run` and `submitJob` methods just seems sloppy and error-prone.

I think I would prefer that instead of having a `run` method that takes a 
closure returning `Future[T]`, `ComplexFutureAction` should accept a 
constructor parameter of type `JobSubmitter => Future[T]`, where `JobSubmitter` 
would be a (Spark-private) trait providing the `submitJob` method. This would 
make `ComplexFutureAction` more or less immutable after construction (except 
for cancellation) and prevent someone from calling `run` or `submitJob` from 
outside Spark and making a mess of things. This is a fairly major changes 
introducing a new trait, so I will hold off implementing it until I get some 
positive feedback about it.

Additionally, it seems like certain common aspects of `SimpleFutureAction` 
and `ComplexFutureAction`  (such as the `_cancellation` field and the base 
implementation of `cancel`) could be pulled out into an implementation trait 
(i.e., `FutureActionLike`) to avoid duplicating code.

Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

2015-11-15 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9367#discussion_r44880055
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala ---
@@ -119,5 +124,47 @@ class TaskResultGetterSuite extends SparkFunSuite with 
BeforeAndAfter with Local
 // Make sure two tasks were run (one failed one, and a second retried 
one).
 assert(scheduler.nextTaskId.get() === 2)
   }
+
+  // SPARK-11195
+  // Make sure we are using the context classloader when deserializing 
failed TaskResults instead of
+  // the Spark classloader.
+  test("failed task deserialized with the correct classloader") {
+// compile a small jar containing an exception that will be thrown on 
an executor.
+val tempDir = Utils.createTempDir()
+val srcDir = new File(tempDir, "repro/")
+srcDir.mkdirs()
+val excSource = new JavaSourceFromString(new File(srcDir, 
"MyException").getAbsolutePath,
+  """package repro;
+|
+|public class MyException extends Exception {
+|}
+  """.stripMargin)
+val excFile = TestUtils.createCompiledClass("MyException", srcDir, 
excSource, Seq.empty)
+val jarFile = new File(tempDir, 
"testJar-%s.jar".format(System.currentTimeMillis()))
+TestUtils.createJar(Seq(excFile), jarFile, directoryPrefix = 
Some("repro"))
+
+// load the exception from the jar
+val loader = new MutableURLClassLoader(new Array[URL](0), 
Thread.currentThread.getContextClassLoader)
+loader.addURL(jarFile.toURI.toURL)
+Thread.currentThread().setContextClassLoader(loader)
--- End diff --

Can we set the original loader back?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

2015-11-15 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9367#discussion_r44880126
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala ---
@@ -119,5 +124,47 @@ class TaskResultGetterSuite extends SparkFunSuite with 
BeforeAndAfter with Local
 // Make sure two tasks were run (one failed one, and a second retried 
one).
 assert(scheduler.nextTaskId.get() === 2)
   }
+
+  // SPARK-11195
+  // Make sure we are using the context classloader when deserializing 
failed TaskResults instead of
+  // the Spark classloader.
+  test("failed task deserialized with the correct classloader") {
--- End diff --

Can we add comments to outline how this fix is tested?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11738] [SQL] Making ArrayType orderable

2015-11-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9718


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Exit AsynchronousListenerBus thr...

2015-11-15 Thread ted-yu

Github user ted-yu commented on the pull request:

https://github.com/apache/spark/pull/9546#issuecomment-156866484
  
Planning to send out a PR to fix the regression by keeping count of queued 
events first time seeing the stop flag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Process outstanding requests aft...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9723#issuecomment-156870251
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45961/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Process outstanding requests aft...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9723#issuecomment-156877667
  
**[Test build #45962 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45962/consoleFull)**
 for PR 9723 at commit 
[`cb4132d`](https://github.com/apache/spark/commit/cb4132da49176adf5f98934ea06b41526ccf8cc2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156879272
  
I saw accidently `TODO Adds test case for reading dictionary encoded 
decimals written as 'FIXED_LEN_BYTE_ARRAY'`.

I will also add this test in the following PR for using the overloaded 
`writeMetaFile`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6541 - Sort executors by ID (numeric)

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9165#issuecomment-156856529
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6541 - Sort executors by ID (numeric)

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9165#issuecomment-156856532
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45957/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6541 - Sort executors by ID (numeric)

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9165#issuecomment-156856492
  
**[Test build #45957 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45957/consoleFull)**
 for PR 9165 at commit 
[`941db75`](https://github.com/apache/spark/commit/941db75ea21571b729ec6e35ee8b8b03190840f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8872#issuecomment-156858796
  
**[Test build #45958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45958/consoleFull)**
 for PR 8872 at commit 
[`7a16052`](https://github.com/apache/spark/commit/7a16052478d6f2723c57e21cb29748bce7bf25e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Exit AsynchronousListenerBus thr...

2015-11-15 Thread ted-yu

Github user ted-yu commented on the pull request:

https://github.com/apache/spark/pull/9546#issuecomment-156867526
  
Cloning git repo was extremely slow.
Here is proposed fix:
```
diff --git 
a/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala 
b/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala
index b3b54af..cc58bc5 100644
--- 
a/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala
+++ 
b/core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala
@@ -56,19 +56,24 @@ private[spark] abstract class AsynchronousListenerBus[L 
<: AnyRef, E](name: Stri

   // A counter that represents the number of events produced and consumed 
in the queue
   private val eventLock = new Semaphore(0)
+  // limit on the number of events to process before exiting. -1 means no 
limit
+  private val eventLimit = -1

   private val listenerThread = new Thread(name) {
 setDaemon(true)
 override def run(): Unit = Utils.tryOrStopSparkContext(sparkContext) {
-  while (true) {
+  while (eventLimit != 0) {
 eventLock.acquire()
 self.synchronized {
   processingEvent = true
 }
 try {
   if (stopped.get()) {
-// Get out of the while loop and shutdown the daemon thread
-return
+eventLimit = eventQueue.size
+if (eventLimit == 0) {
+  // Get out of the while loop and shutdown the daemon thread
+  return
+}
   }
   val event = eventQueue.poll
   assert(event != null, "event queue was empty but the listener 
bus was not stopped")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9026] Modifications to JobWaiter, Futur...

2015-11-15 Thread reggert

Github user reggert commented on a diff in the pull request:

https://github.com/apache/spark/pull/9264#discussion_r44879232
  
--- Diff: 
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -27,7 +27,7 @@ import org.scalatest.BeforeAndAfterAll
 import org.scalatest.concurrent.Timeouts
 import org.scalatest.time.SpanSugar._
 
-import org.apache.spark.{LocalSparkContext, SparkContext, SparkException, 
SparkFunSuite}
+import org.apache.spark._
--- End diff --

The line needs to change regardless, because an import was added. 
Explicitly specifying 5 imported classes causes the line to exceed 100 
characters, however.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156868010
  
**[Test build #45959 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45959/consoleFull)**
 for PR 9542 at commit 
[`4481c82`](https://github.com/apache/spark/commit/4481c82a98af62cc4d46d2f07c4d728236bf6d83).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156868048
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45959/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156868046
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-11522][SQL] input_file_name() returns "...

2015-11-15 Thread xwu0226

Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/9542#issuecomment-156869030
  
@yhuai I did not know that we should not update the resources/data 
directory.. I thought the test data files were added along the way by 
contributors. Thanks for pointing it out! Let me update HiveUDFSuite then. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11390] [SQL] Query plan with/without fi...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9679#issuecomment-156856575
  
**[Test build #45960 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45960/consoleFull)**
 for PR 9679 at commit 
[`9597b39`](https://github.com/apache/spark/commit/9597b39795d22ccd2aaf6e3885b34eb924534174).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

2015-11-15 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9367#discussion_r44880215
  
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -78,15 +79,15 @@ private[spark] object TestUtils {
   }
 
   /**
-   * Create a jar file that contains this set of files. All files will be 
located at the root
-   * of the jar.
+   * Create a jar file that contains this set of files. All files will be 
located in the specified
+   * directory or at the root of the jar.
*/
-  def createJar(files: Seq[File], jarFile: File): URL = {
+  def createJar(files: Seq[File], jarFile: File, directoryPrefix: 
Option[String] = None): URL = {
--- End diff --

Do we need to add `directoryPrefix`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11677][SQL] ORC filter tests all pass i...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9687#issuecomment-156883379
  
**[Test build #45965 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45965/consoleFull)**
 for PR 9687 at commit 
[`cd7bd12`](https://github.com/apache/spark/commit/cd7bd12337539be93198cc7c1610b7779dbef558).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9928][SQL] Removal of LogicalLocalTable...

2015-11-15 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/9717#issuecomment-156888382
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Process outstanding requests aft...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9723#issuecomment-156889821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45968/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Process outstanding requests aft...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9723#issuecomment-156889817
  
**[Test build #45968 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45968/consoleFull)**
 for PR 9723 at commit 
[`8be6a7c`](https://github.com/apache/spark/commit/8be6a7ceff31ea774b0b1ce86c041ecdfd99a9e3).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Process outstanding requests aft...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9723#issuecomment-156889820
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11745][SQL] Enable more JSON parsing op...

2015-11-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9724#discussion_r44883818
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala
 ---
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.json
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+/**
+ * Test cases for various [[JSONOptions]].
+ */
+class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext {
+
+  test("allowComments off") {
+val str = """{'name': /* hello */ 'Reynold Xin'}"""
+val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+val df = sqlContext.read.json(rdd)
+
+assert(df.schema.head.name == "_corrupt_record")
+  }
+
+  test("allowComments on") {
+val str = """{'name': /* hello */ 'Reynold Xin'}"""
+val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+val df = sqlContext.read.option("allowComments", "true").json(rdd)
+
+assert(df.schema.head.name == "name")
+assert(df.first().getString(0) == "Reynold Xin")
+  }
+
+  test("allowSingleQuotes off") {
+val str = """{'name': 'Reynold Xin'}"""
+val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+val df = sqlContext.read.option("allowSingleQuotes", "false").json(rdd)
+
+assert(df.schema.head.name == "_corrupt_record")
+  }
+
+  test("allowSingleQuotes on") {
+val str = """{'name': 'Reynold Xin'}"""
+val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+val df = sqlContext.read.json(rdd)
+
+assert(df.schema.head.name == "name")
+assert(df.first().getString(0) == "Reynold Xin")
+  }
+
+  test("allowUnquotedFieldNames off") {
+val str = """{name: 'Reynold Xin'}"""
+val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+val df = sqlContext.read.json(rdd)
+
+assert(df.schema.head.name == "_corrupt_record")
+  }
+
+  test("allowUnquotedFieldNames on") {
+val str = """{name: 'Reynold Xin'}"""
+val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+val df = sqlContext.read.option("allowUnquotedFieldNames", 
"true").json(rdd)
+
+assert(df.schema.head.name == "name")
+assert(df.first().getString(0) == "Reynold Xin")
+  }
+
+  test("allowNumericLeadingZeros off") {
+val str = """{"age": 0018}"""
+val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+val df = sqlContext.read.json(rdd)
+
+assert(df.schema.head.name == "_corrupt_record")
+  }
+
+  test("allowNumericLeadingZeros on") {
+val str = """{"age": 0018}"""
+val rdd = sqlContext.sparkContext.parallelize(Seq(str))
+val df = sqlContext.read.option("allowNumericLeadingZeros", 
"true").json(rdd)
+
+assert(df.schema.head.name == "age")
+assert(df.first().getLong(0) == 18)
+  }
+
+  // The following two tests are not really working - need to look into 
Jackson's
+  // JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS.
+  ignore("allowNonNumericNumbers off") {
--- End diff --

this is ignored for now -- i will file a ticket once this is merged so we 
look into this in the future.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11745][SQL] Enable more JSON parsing op...

2015-11-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9724#discussion_r44883828
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONOptions.scala
 ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.json
+
+import com.fasterxml.jackson.core.{JsonParser, JsonFactory}
+
+/**
+ * Options for the JSON data source.
+ *
+ * Most of these map directly to Jackson's internal options, specified in 
[[JsonParser.Feature]].
+ */
+case class JSONOptions(
+samplingRatio: Double = 1.0,
+primitivesAsString: Boolean = false,
+allowComments: Boolean = false,
+allowUnquotedFieldNames: Boolean = false,
+allowSingleQuotes: Boolean = true,
+allowNumericLeadingZeros: Boolean = false,
+allowNonNumericNumbers: Boolean = false) {
--- End diff --

allowNonNumericNumbers is undocumented for now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156891545
  
**[Test build #45964 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45964/consoleFull)**
 for PR 9060 at commit 
[`cea5034`](https://github.com/apache/spark/commit/cea50348da091e5d83c14474a76d4f49e1ff3c9b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11572] Process outstanding requests aft...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9723#issuecomment-156891548
  
**[Test build #45973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45973/consoleFull)**
 for PR 9723 at commit 
[`cd9b2f2`](https://github.com/apache/spark/commit/cd9b2f2b8e2cfaa4e6e9814baea1238f3f8bc1b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156891628
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45964/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9060#issuecomment-156891627
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11745][SQL] Enable more JSON parsing op...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9724#issuecomment-156891508
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11692][SQL] Support for Parquet logical...

2015-11-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9658#issuecomment-156892687
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11745][SQL] Enable more JSON parsing op...

2015-11-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9724#issuecomment-156892716
  
**[Test build #2061 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2061/consoleFull)**
 for PR 9724 at commit 
[`00cfc19`](https://github.com/apache/spark/commit/00cfc198556dc92f67cc77e11fa1106752c99826).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11745][SQL] Enable more JSON parsing op...

2015-11-15 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9724#discussion_r44884636
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -227,6 +227,15 @@ class DataFrameReader private[sql](sqlContext: 
SQLContext) extends Logging {
* This function goes through the input once to determine the input 
schema. If you know the
* schema in advance, use the version that specifies the schema to avoid 
the extra scan.
*
+   * You can set the following JSON-specific options to deal with 
non-standard JSON files:
+   * `primitivesAsString` (default `false`): infers all primitive 
values as a string type
+   * `allowComments` (default `false`): ignores Java/C++ style comment 
in JSON records
+   * `allowUnquotedFieldNames` (default `false`): allows unquoted JSON 
field names
+   * `allowSingleQuotes` (default `true`): allows single quotes in 
addition to double quotes
+   * 
+   * `allowNumericLeadingZeros` (default `false`): allows leading 
zeros in numbers
+   * (e.g. 00012)
--- End diff --

Add `samplingRatio`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 264 matches

Mail list logo