date:20150216

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74629902
  
  [Test build #27621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27621/consoleFull)
 for   PR 4642 at commit 
[`d291c34`](https://github.com/apache/spark/commit/d291c347687da1576ba8fafc855d05f9da3419b1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-74629893
  
  [Test build #27620 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27620/consoleFull)
 for   PR 4602 at commit 
[`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4584#issuecomment-74629906
  
  [Test build #27619 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27619/consoleFull)
 for   PR 4584 at commit 
[`e5bdc3a`](https://github.com/apache/spark/commit/e5bdc3a2f1847098f3f663d6e3a336cbdaf50bce).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest

2015-02-16 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4584#issuecomment-74629772
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-02-16 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-74629760
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] [Minor] Update the HiveContext Unittest

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4584#issuecomment-74629480
  
  [Test build #27618 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27618/consoleFull)
 for   PR 4584 at commit 
[`e5bdc3a`](https://github.com/apache/spark/commit/e5bdc3a2f1847098f3f663d6e3a336cbdaf50bce).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74628875
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27613/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74628869
  
  [Test build #27613 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27613/consoleFull)
 for   PR 4642 at commit 
[`9be66e3`](https://github.com/apache/spark/commit/9be66e326f2fc50bb81b9f2cff82ab77714230d6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-02-16 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-74628619
  
Thank you @yhuai , I've updated the description and rebased the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4602#issuecomment-74628623
  
  [Test build #27617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27617/consoleFull)
 for   PR 4602 at commit 
[`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4629#issuecomment-74628437
  
  [Test build #611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/611/consoleFull)
 for   PR 4629 at commit 
[`4d29932`](https://github.com/apache/spark/commit/4d29932172301731db904176636d530631f448ea).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4643#issuecomment-74628174
  
  [Test build #27616 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27616/consoleFull)
 for   PR 4643 at commit 
[`717cfb0`](https://github.com/apache/spark/commit/717cfb055dcdbdf682a1a891e2413ab0d66de211).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...

2015-02-16 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4643#issuecomment-74627979
  
/cc @brennonyork if you want to take a quick look. I'll probably merge this 
soon since it's needed for some release packaging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...

2015-02-16 Thread pwendell

GitHub user pwendell opened a pull request:

https://github.com/apache/spark/pull/4643

SPARK-5856: In Maven build script, launch Zinc with more memory

I've seen out of memory exceptions when trying
to run many parallel builds against the same Zinc
server during packaging. We should use the same
increased memory settings we use for Maven itself.

I tested this and confirmed that the Nailgun JVM
launched with higher memory.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pwendell/spark zinc-memory

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4643


commit 717cfb055dcdbdf682a1a891e2413ab0d66de211
Author: Patrick Wendell 
Date:   2015-02-17T07:29:39Z

SPARK-5856: Launch Zinc with larger memory options.

I've seen out of memory exceptions when trying
to run many parallel builds against the same Zinc
server during packaging. We should use the same
increased memory settings we use for Maven itself.

I tested this and confirmed that the Nailgun JVM
launched with higher memory.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24797498
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
@@ -121,13 +121,50 @@ class ParquetDataSourceOnMetastoreSuite extends 
ParquetMetastoreSuiteBase {
 
   override def beforeAll(): Unit = {
 super.beforeAll()
+
+sql(s"""
+  create table test_parquet
+  (
+intField INT,
+stringField STRING
+  )
+  ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+   STORED AS
+   INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
+   OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+""")
+
+val rdd = sparkContext.parallelize((1 to 10).map(i => s"""{"a":$i, 
"b":"str${i}"}"""))
+jsonRDD(rdd).registerTempTable("jt")
+sql("""
+  create table test ROW FORMAT
+  |  SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+  |  STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
+  |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+  |  AS select * from jt""".stripMargin)
+
--- End diff --

Oh, i thought `STORED AS PARQUERT AS ..` is just the syntactic sugar. 
Unfortunately, all of the test suite are implemented in the sub project `sql`, 
but the `HiveShim` is in the subproject `hive` with `hive` package accessing 
visibility.

Let's put this test in another PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4562#issuecomment-74627418
  
  [Test build #27615 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27615/consoleFull)
 for   PR 4562 at commit 
[`36978d1`](https://github.com/apache/spark/commit/36978d1835ab6e0266ad3787b33056b573fd59e8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-16 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/4634#issuecomment-74627399
  
Are there ever situations where `combineByKey` should be used instead of 
`aggregateByKey`?  I tend to think of `combineByKey` as an internal API that's 
exposed for historical reasons.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-16 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4460#issuecomment-74623712
  
Btw, for the feature type, beside continuous and categorical, do we want to 
make binary special? It could be treated as both continuous and categorical.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-16 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4460#issuecomment-74623603
  
There are two types of `Attribute(s)`: describing a feature group (a vector 
column) or describing a single feature (a scalar column). For a feature group, 
the column name becomes the group name and individual features inside this 
group may have their own names. For example, we have a vector column called 
`user` and inside this feature group we can have features named `age` and 
`gender`. When we merge multiple groups into a single feature vector, e.g., in 
a feature vector assembler, the names are flattened like `user:age` and 
`user:gender`. This answers @sryza 's question about one-hot-encoding. Assume 
that the input column is a scalar column called "country" with categories 
stored in the attribute. Then OneHotEncoder will output a vector column and 
generate feature attributes with names like `country:US`, `country:CA`, etc.

+1 on @jkbradley 's suggestion about not calling it `FeatureAttribute`. 
`Attribute` should be okay to describe a scalar column but we also need a name 
to describe a vector column, where `Attributes` may sounds a little confusing. 
I suggest `AttributeGroup`.

We don't need to care about the `FeatureType` in `mllib.tree` in this PR. 
Once we have this PR merged, we can migrate the decision tree code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3027#issuecomment-74622928
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27614/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3027#issuecomment-74622925
  
  [Test build #27614 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27614/consoleFull)
 for   PR 3027 at commit 
[`d3b9253`](https://github.com/apache/spark/commit/d3b9253d3ac31f4a5178d45afaa4eb5b56eb537a).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SparkJobInfo(namedtuple("SparkJobInfo", "jobId stageIds 
status")):`
  * `class SparkStageInfo(namedtuple("SparkStageInfo",`
  * `class StatusTracker(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3027#issuecomment-74622799
  
  [Test build #27614 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27614/consoleFull)
 for   PR 3027 at commit 
[`d3b9253`](https://github.com/apache/spark/commit/d3b9253d3ac31f4a5178d45afaa4eb5b56eb537a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4629#issuecomment-74622588
  
  [Test build #611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/611/consoleFull)
 for   PR 4629 at commit 
[`4d29932`](https://github.com/apache/spark/commit/4d29932172301731db904176636d530631f448ea).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74622438
  
  [Test build #27613 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27613/consoleFull)
 for   PR 4642 at commit 
[`9be66e3`](https://github.com/apache/spark/commit/9be66e326f2fc50bb81b9f2cff82ab77714230d6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-16 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4622#issuecomment-74622054
  
@mengxr okay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5793][SQL] Add explode to Column

2015-02-16 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/4585#issuecomment-74621954
  
@rxin is this pr ready to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5802][MLLIB] cache transformed data in ...

2015-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4593


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5802][MLLIB] cache transformed data in ...

2015-02-16 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4593#issuecomment-74621554
  
Merged into master and branch-1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4629#issuecomment-74621490
  
  [Test build #27612 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27612/consoleFull)
 for   PR 4629 at commit 
[`4d29932`](https://github.com/apache/spark/commit/4d29932172301731db904176636d530631f448ea).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Partitioner(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4629#issuecomment-74621492
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27612/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4629#issuecomment-74621421
  
  [Test build #27612 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27612/consoleFull)
 for   PR 4629 at commit 
[`4d29932`](https://github.com/apache/spark/commit/4d29932172301731db904176636d530631f448ea).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74621154
  
  [Test build #27611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27611/consoleFull)
 for   PR 4620 at commit 
[`88e4b05`](https://github.com/apache/spark/commit/88e4b05094eb64bf4d85f54c7a5e2037bbc6f06a).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74621158
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27611/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74621085
  
  [Test build #27611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27611/consoleFull)
 for   PR 4620 at commit 
[`88e4b05`](https://github.com/apache/spark/commit/88e4b05094eb64bf4d85f54c7a5e2037bbc6f06a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5826][Streaming] Fix Configuration not ...

2015-02-16 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/4612#issuecomment-74619833
  
Yeah, I'm sure now `Configuration` is just a constructor statement not a 
field, so `@transient` is no needed. I have a local test which verified this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24794112
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -287,7 +287,11 @@ case class ParquetRelation2(
 }
   }
 
-  parquetSchema = maybeSchema.getOrElse(readSchema())
+  try {
+parquetSchema = readSchema().getOrElse(maybeSchema.get)
+  } catch {
+case e => throw new SparkException(s"Failed to find schema for 
${paths.mkString(",")}", e)
+  }
--- End diff --

Based on Cheng's comment at 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L194,
 I think that it is better to keep `maybeMetastoreSchema` and we just fix the 
bug for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24793987
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -630,11 +635,12 @@ object ParquetRelation2 {
 sqlContext.conf.isParquetBinaryAsString,
 sqlContext.conf.isParquetINT96AsTimestamp))
   }
-}.reduce { (left, right) =>
-  try left.merge(right) catch { case e: Throwable =>
-throw new SparkException(s"Failed to merge incompatible schemas 
$left and $right", e)
-  }
-}
+}.foldLeft[StructType](null) {
--- End diff --

All right. Instead of putting a large code block in `Option`, how about use 
a temporary `val` and then use `Option` at the end of this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24793945
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -287,7 +287,11 @@ case class ParquetRelation2(
 }
   }
 
-  parquetSchema = maybeSchema.getOrElse(readSchema())
+  try {
+parquetSchema = readSchema().getOrElse(maybeSchema.get)
+  } catch {
+case e => throw new SparkException(s"Failed to find schema for 
${paths.mkString(",")}", e)
+  }
--- End diff --

After reading the source code, I am wondering if the `maybeMetastoreSchema` 
is redundant, and it probably should be always converted into `maybeSchema` 
when creating the `ParquetRelation2` instance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-74617889
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27608/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-74617885
  
  [Test build #27608 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27608/consoleFull)
 for   PR 4231 at commit 
[`58c19a5`](https://github.com/apache/spark/commit/58c19a5399e09329631a455ec2e535f71e31ed97).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24793769
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
@@ -121,13 +121,50 @@ class ParquetDataSourceOnMetastoreSuite extends 
ParquetMetastoreSuiteBase {
 
   override def beforeAll(): Unit = {
 super.beforeAll()
+
+sql(s"""
+  create table test_parquet
+  (
+intField INT,
+stringField STRING
+  )
+  ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+   STORED AS
+   INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
+   OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+""")
+
+val rdd = sparkContext.parallelize((1 to 10).map(i => s"""{"a":$i, 
"b":"str${i}"}"""))
+jsonRDD(rdd).registerTempTable("jt")
+sql("""
+  create table test ROW FORMAT
+  |  SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+  |  STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
+  |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+  |  AS select * from jt""".stripMargin)
+
--- End diff --

How about we use `if (HiveShim.version =="0.13.1")` to check the Hive 
version like what we did in 
https://github.com/apache/spark/commit/e0490e271d078aa55d7c7583e2ba80337ed1b0c4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24793692
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -630,11 +635,12 @@ object ParquetRelation2 {
 sqlContext.conf.isParquetBinaryAsString,
 sqlContext.conf.isParquetINT96AsTimestamp))
   }
-}.reduce { (left, right) =>
-  try left.merge(right) catch { case e: Throwable =>
-throw new SparkException(s"Failed to merge incompatible schemas 
$left and $right", e)
-  }
-}
+}.foldLeft[StructType](null) {
--- End diff --

Yeah, I was trying that also, but seems using `null` is more simple, as 
`Option` requires some more value extracting code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24793676
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -287,7 +287,11 @@ case class ParquetRelation2(
 }
   }
 
-  parquetSchema = maybeSchema.getOrElse(readSchema())
+  try {
+parquetSchema = readSchema().getOrElse(maybeSchema.get)
+  } catch {
+case e => throw new SparkException(s"Failed to find schema for 
${paths.mkString(",")}", e)
+  }
--- End diff --

Also, seems we do not need `try ... catch` at here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24793419
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
@@ -121,13 +121,50 @@ class ParquetDataSourceOnMetastoreSuite extends 
ParquetMetastoreSuiteBase {
 
   override def beforeAll(): Unit = {
 super.beforeAll()
+
+sql(s"""
+  create table test_parquet
+  (
+intField INT,
+stringField STRING
+  )
+  ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+   STORED AS
+   INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
+   OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+""")
+
+val rdd = sparkContext.parallelize((1 to 10).map(i => s"""{"a":$i, 
"b":"str${i}"}"""))
+jsonRDD(rdd).registerTempTable("jt")
+sql("""
+  create table test ROW FORMAT
+  |  SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+  |  STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
+  |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+  |  AS select * from jt""".stripMargin)
+
--- End diff --

`STORED AS PARQUET` is supported since Hive 0.13, the unit test may failed 
in Hive 0.12 if we do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74617269
  
  [Test build #27610 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27610/consoleFull)
 for   PR 4642 at commit 
[`d56afc2`](https://github.com/apache/spark/commit/d56afc24178642ed13995877ce0d851175340584).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74617271
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27610/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5825] [Spark Submit] Remove the double ...

2015-02-16 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4611#issuecomment-74617241
  
@srowen I've tested that both under ubuntu 12.04 and centos 6.5, `ps -f -p 
...` only print the first 4096 characters of its arguments.
By the way, I've also checked the `hadoop-daemon.sh` of hadoop (hadoop 
2.3), seems it doesn't confirm the process name as we did in `spark-daemon.sh`. 
Or can we just confirm if it's a java process?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74617207
  
  [Test build #27610 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27610/consoleFull)
 for   PR 4642 at commit 
[`d56afc2`](https://github.com/apache/spark/commit/d56afc24178642ed13995877ce0d851175340584).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4642#discussion_r24792972
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -792,16 +896,73 @@ trait DataFrame extends RDDApi[Row] with Serializable 
{
* :: Experimental ::
* Adds the rows from this RDD to the specified table.
* Throws an exception if the table already exists.
+   * @group output
*/
   @Experimental
   def insertInto(tableName: String): Unit = insertInto(tableName, 
overwrite = false)
 
   /**
* Returns the content of the [[DataFrame]] as a RDD of JSON strings.
+   * @group rdd
*/
   def toJSON: RDD[String]
 
   

+  // JDBC Write Support
+  

+
+  /**
+   * Save this RDD to a JDBC database at `url` under the table name 
`table`.
+   * This will run a `CREATE TABLE` and a bunch of `INSERT INTO` 
statements.
+   * If you pass `true` for `allowExisting`, it will drop any table with 
the
+   * given name; if you pass `false`, it will throw if the table already
+   * exists.
+   * @group output
+   */
+  def createJDBCTable(url: String, table: String, allowExisting: Boolean) {
--- End diff --

the impl should go into DataFrameImpl shouldn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4642#discussion_r24792776
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -17,6 +17,10 @@
 
 package org.apache.spark.sql
 
+import java.sql.DriverManager
+
+import org.apache.spark.sql.jdbc.JDBCWriteDetails
--- End diff --

nit import order


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74616986
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27609/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74616985
  
  [Test build #27609 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27609/consoleFull)
 for   PR 4642 at commit 
[`f004747`](https://github.com/apache/spark/commit/f004747ad0e351306a6747e44e310961f35c650c).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4642#issuecomment-74616944
  
  [Test build #27609 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27609/consoleFull)
 for   PR 4642 at commit 
[`f004747`](https://github.com/apache/spark/commit/f004747ad0e351306a6747e44e310961f35c650c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4641#issuecomment-74616924
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5166][SPARK-5247][SPARK-5258][SQL] API ...

2015-02-16 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/4642

[SPARK-5166][SPARK-5247][SPARK-5258][SQL] API Cleanup / Documentation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4642.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4642


commit 42e2b73371468bbec63648368044bd7e77f888a4
Author: Michael Armbrust 
Date:   2015-02-17T04:06:13Z

[SQL] Documentation / API Clean-up.

commit c4a907b40e5404d944d579cfe93d5250241c2afe
Author: Michael Armbrust 
Date:   2015-02-17T04:35:17Z

fix tests

commit f004747ad0e351306a6747e44e310961f35c650c
Author: Michael Armbrust 
Date:   2015-02-17T04:37:55Z

fix build




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-16 Thread dondrake

GitHub user dondrake opened a pull request:

https://github.com/apache/spark/pull/4641

[SPARK-5722][SQL] fix for infer long type in python similar to Java long 
(master branch)


Corresponding fix for SPARK-5722 for the master (1.3) branch.  See Pull 
#4521 for 1.2 version.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dondrake/spark drake_python_long

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4641.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4641


commit 79f136d5ab347bf2c9afe8d0c5c29bcdc214e634
Author: Don Drake 
Date:   2015-02-17T04:11:31Z

SPARK-5722 fixes for inferring LongType

commit 9aa0737844ae66b25487f1d5979bdf4f7a23eddd
Author: Don Drake 
Date:   2015-02-17T04:45:04Z

Merge branch 'master' into drake_python_long

Conflicts:
python/pyspark/sql/dataframe.py




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.

2015-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4640


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

2015-02-16 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4601#issuecomment-74616323
  
I merged this into `master` (1.4.0), `branch-1.3` (1.3.0), and `branch-1.2` 
(1.2.2), but did so _right_ before I noticed that there's [a 
comment](https://issues.apache.org/jira/browse/SPARK-5363?focusedCommentId=14323623&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14323623)
 on JIRA suggesting that this didn't fix the freeze.  I guess I was a bit too 
trigger-happy here since I wanted to try to squeeze a fix in for 1.3.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5395] [PySpark] fix python process leak...

2015-02-16 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4238#issuecomment-74616043
  
I've merged this into `branch-1.2` (1.2.2) as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added support for accessing secured HDFS

2015-02-16 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2320#issuecomment-74616039
  
Let's close this issue. There is an alternative PR that is currently 
ongoing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24791828
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -208,14 +208,14 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
 ParquetRelation2(
   paths,
   Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json),
-  None,
+  Some(metastoreSchema),
   Some(partitionSpec))(hive))
 } else {
   val paths = 
Seq(metastoreRelation.hiveQlTable.getDataLocation.toString)
-  LogicalRelation(
-ParquetRelation2(
+  LogicalRelation(ParquetRelation2(
   paths,
-  Map(ParquetRelation2.METASTORE_SCHEMA -> 
metastoreSchema.json))(hive))
+  Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json),
+  Some(metastoreSchema))(hive))
--- End diff --

OK, we can leave this file unchanged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...

2015-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4638


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24791821
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/parquet/parquetSuites.scala ---
@@ -121,13 +121,50 @@ class ParquetDataSourceOnMetastoreSuite extends 
ParquetMetastoreSuiteBase {
 
   override def beforeAll(): Unit = {
 super.beforeAll()
+
+sql(s"""
+  create table test_parquet
+  (
+intField INT,
+stringField STRING
+  )
+  ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+   STORED AS
+   INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
+   OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+""")
+
+val rdd = sparkContext.parallelize((1 to 10).map(i => s"""{"a":$i, 
"b":"str${i}"}"""))
+jsonRDD(rdd).registerTempTable("jt")
+sql("""
+  create table test ROW FORMAT
+  |  SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
+  |  STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
+  |  OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
+  |  AS select * from jt""".stripMargin)
+
--- End diff --

Also add a test for `CREATE TABLE ... STORED AS PARQUET AS ...`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

2015-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4601


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24791812
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -630,11 +635,12 @@ object ParquetRelation2 {
 sqlContext.conf.isParquetBinaryAsString,
 sqlContext.conf.isParquetINT96AsTimestamp))
   }
-}.reduce { (left, right) =>
-  try left.merge(right) catch { case e: Throwable =>
-throw new SparkException(s"Failed to merge incompatible schemas 
$left and $right", e)
-  }
-}
+}.foldLeft[StructType](null) {
--- End diff --

How about using `None` instead of `null`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24791802
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala ---
@@ -287,7 +287,11 @@ case class ParquetRelation2(
 }
   }
 
-  parquetSchema = maybeSchema.getOrElse(readSchema())
+  try {
+parquetSchema = readSchema().getOrElse(maybeSchema.get)
+  } catch {
+case e => throw new SparkException(s"Failed to find schema for 
${paths.mkString(",")}", e)
+  }
--- End diff --

How about this 
```
parquetSchema = {
  if (maybeSchema.isDefined) {
maybeSchema.get
  } else {
(readSchema(), maybeMetastoreSchema) match {
  case (Some(dataSchema), _) => dataSchema
  case (None, Some(metastoreSchema)) => metastoreSchema
  case (None, None) =>
throw new SparkException("Failed to get the schema.")
 }
  }
}
```
We first check if maybeSchema is defined. If not, we read the schema from 
existing data. If existing data does not exist, we are dealing with a newly 
created empty table and we will use maybeMetastoreSchema defined in the options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5363] [PySpark] check ending mark in no...

2015-02-16 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4601#issuecomment-74615791
  
LGTM.  I'm going to merge this into `master` (1.4.0), `branch-1.3` (1.3.0), 
and `branch-1.2` (1.2.2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4562#issuecomment-74615738
  
  [Test build #27607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27607/consoleFull)
 for   PR 4562 at commit 
[`a04930b`](https://github.com/apache/spark/commit/a04930badb291e55ba4e6ba79ce781a89f827932).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4562#issuecomment-74615739
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27607/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4640#issuecomment-74615587
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27606/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4640#issuecomment-74615582
  
  [Test build #27606 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27606/consoleFull)
 for   PR 4640 at commit 
[`9c6f569`](https://github.com/apache/spark/commit/9c6f569139fcca2152c07bcef340afed2bef0778).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class GenericRowWithSchema(values: Array[Any], override val schema: 
StructType)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-16 Thread dondrake

Github user dondrake commented on the pull request:

https://github.com/apache/spark/pull/4521#issuecomment-74615073
  
OK, this PR, which is against branch-1.2 is now updated and I've verified 
that the tests are now passing.

I created another branch off of the master (named drake_python_long) that 
has the changes needed for v1.3.  I'll create another PR for that one.

Please test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24791423
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -208,14 +208,14 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
 ParquetRelation2(
   paths,
   Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json),
-  None,
+  Some(metastoreSchema),
   Some(partitionSpec))(hive))
 } else {
   val paths = 
Seq(metastoreRelation.hiveQlTable.getDataLocation.toString)
-  LogicalRelation(
-ParquetRelation2(
+  LogicalRelation(ParquetRelation2(
   paths,
-  Map(ParquetRelation2.METASTORE_SCHEMA -> 
metastoreSchema.json))(hive))
+  Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json),
+  Some(metastoreSchema))(hive))
--- End diff --

Oh, but if we do create table, we have to pass the metastore schema.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-16 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/4460#issuecomment-74614834
  
This is perhaps contained in @jkbradley 's question, but how does this work 
with features that are represented with multiple entries in the feature vector 
- e.g. when we're doing a one-hot encoding.  With a one-hot encoding is each 
category its own feature or can a feature span multiple indices in the vector?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4638#issuecomment-74614179
  
  [Test build #27603 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27603/consoleFull)
 for   PR 4638 at commit 
[`386126f`](https://github.com/apache/spark/commit/386126fb9e60e8c7a08bb098b366b78d335750be).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4638#issuecomment-74614187
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27603/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74614095
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27604/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74614087
  
  [Test build #27604 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27604/consoleFull)
 for   PR 4620 at commit 
[`673e4e3`](https://github.com/apache/spark/commit/673e4e3c2720ef88a4316656ba9972d06d17980c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5852] [SQL] Passdown the schema for Par...

2015-02-16 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/4562#discussion_r24791099
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -208,14 +208,14 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
 ParquetRelation2(
   paths,
   Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json),
-  None,
+  Some(metastoreSchema),
   Some(partitionSpec))(hive))
 } else {
   val paths = 
Seq(metastoreRelation.hiveQlTable.getDataLocation.toString)
-  LogicalRelation(
-ParquetRelation2(
+  LogicalRelation(ParquetRelation2(
   paths,
-  Map(ParquetRelation2.METASTORE_SCHEMA -> 
metastoreSchema.json))(hive))
+  Map(ParquetRelation2.METASTORE_SCHEMA -> metastoreSchema.json),
+  Some(metastoreSchema))(hive))
--- End diff --

I think we cannot do it. See 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L194


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...

2015-02-16 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4627#issuecomment-74613715
  
@pwendell I suppose we could revert.  A bandaid patch would be to just wrap 
this in a try block and ignore exceptions thrown when removing the hook (see 
also: `Utils.inShutdown()`).  What do you think about just hotfixing in a `Try`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-74613572
  
  [Test build #27608 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27608/consoleFull)
 for   PR 4231 at commit 
[`58c19a5`](https://github.com/apache/spark/commit/58c19a5399e09329631a455ec2e535f71e31ed97).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-74612880
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27605/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-74612876
  
  [Test build #27605 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27605/consoleFull)
 for   PR 4231 at commit 
[`b7c5581`](https://github.com/apache/spark/commit/b7c558174dc15e212227d193d341dfe120cb7634).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] [Minor] Passdown the schema for Parquet ...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4562#issuecomment-74610745
  
  [Test build #27607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27607/consoleFull)
 for   PR 4562 at commit 
[`a04930b`](https://github.com/apache/spark/commit/a04930badb291e55ba4e6ba79ce781a89f827932).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4640#issuecomment-74610755
  
  [Test build #27606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27606/consoleFull)
 for   PR 4640 at commit 
[`9c6f569`](https://github.com/apache/spark/commit/9c6f569139fcca2152c07bcef340afed2bef0778).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5853][SQL] Schema support in Row.

2015-02-16 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/4640

[SPARK-5853][SQL] Schema support in Row.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-5853

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4640.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4640


commit 9c6f569139fcca2152c07bcef340afed2bef0778
Author: Reynold Xin 
Date:   2015-02-17T03:04:28Z

[SPARK-5853][SQL] Schema support in Row.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Various DataFrame doc changes.

2015-02-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4636


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...

2015-02-16 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4627#issuecomment-74609998
  
@JoshRosen should we revert this in 1.3 then? I might create a release 
candidate soon to kick off community testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-16 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4460#issuecomment-74609769
  
I like the current sketch but also want to think about it more.  A few 
thoughts:

I'm not quite clear on how the Array of Attributes in FeatureAttributes 
corresponds to the columns of the DataFrame.  Is it one-to-one, or will 
Attributes be nested?  (I'm basically thinking about groups of features, 
especially individual features grouped into vectors.)

How will propagation of feature names work?  Will we try to impose a 
standard, such as Transformers maintaining the same (or a modified) feature 
name whenever possible?

By the way, do we want to call this "FeatureAttributes," or should we name 
it something like "ColumnAttributes" so it more obviously applies to other 
types of columns like labels, users, products, etc.?

+1 for moving FeatureType from mllib.tree to attribute.  It should be more 
general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4081] [mllib] DatasetIndexer

2015-02-16 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3000#issuecomment-74609759
  
@sryza  Thanks for offering!  That would be great if you have the bandwidth 
to work on this.  I'd be happy to help review.

One comment: It would be nice to be able to take advantage of 
FeatureAttributes in the spark.ml package, but that's a WIP right now: 
[https://github.com/apache/spark/pull/4460]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-74609378
  
  [Test build #27605 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27605/consoleFull)
 for   PR 4231 at commit 
[`b7c5581`](https://github.com/apache/spark/commit/b7c558174dc15e212227d193d341dfe120cb7634).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread MechCoder

Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4231#issuecomment-74609225
  
@jkbradley fixed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-16 Thread MechCoder

Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/4231#discussion_r24789326
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -1064,9 +1045,12 @@ object DecisionTree extends Serializable with 
Logging {
   //   Bins correspond to feature values, so we do not need to 
compute splits or bins
   //   beforehand.  Splits are constructed as needed during 
training.
   splits(featureIndex) = new Array[Split](0)
-  bins(featureIndex) = new Array[Bin](0)
 }
-  }
+// For ordered features, bins correspond to feature values.
+// For unordered categorical features, there is no need to 
construct the bins.
+// since there is a one-to-one correspondence between the 
splits and the bins.
+bins(featureIndex) = new Array[Bin](0)
+}
--- End diff --

Do you mean to move the closing brace 2 spaces ahead? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4638#issuecomment-74609032
  
  [Test build #27603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27603/consoleFull)
 for   PR 4638 at commit 
[`386126f`](https://github.com/apache/spark/commit/386126fb9e60e8c7a08bb098b366b78d335750be).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-16 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74609026
  
  [Test build #27604 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27604/consoleFull)
 for   PR 4620 at commit 
[`673e4e3`](https://github.com/apache/spark/commit/673e4e3c2720ef88a4316656ba9972d06d17980c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...

2015-02-16 Thread MattWhelan

Github user MattWhelan commented on the pull request:

https://github.com/apache/spark/pull/4627#issuecomment-74608953
  
Kinda weird that it would pass sometimes and fail others.  I'll submit a 
fix tomorrow.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5850: Remove experimental label for Scal...

2015-02-16 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4638#issuecomment-74608855
  
Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...

2015-02-16 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4629#issuecomment-74607728
  
LGTM overall; this is tricky logic, though, so I'll take one more pass 
through when I get home.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...

2015-02-16 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4627#issuecomment-74607180
  
I saw it during shutdown from a failed Jenkins test run: 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/1616/console


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-5841: remove DiskBlockManager shutdown h...

2015-02-16 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4627#issuecomment-74607089
  
Ah, how did you trigger that BTW? Yes I will help get it patched. I imagine 
the stop logic must be factored into a method called both by the hook and by 
stop, which can still unregister the hook first. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 454 matches

Mail list logo