date:20160402

[GitHub] spark pull request: [SPARK-13992][Core][PySpark][FollowUp] Update ...

2016-04-02 Thread lw-lin

Github user lw-lin commented on the pull request:

https://github.com/apache/spark/pull/12126#issuecomment-204892494
  
@srowen thank you for pointing this out!

Yes sure let's generalize "tachyon" to "off-heap" in these and other files, 
which has been done in the dedicated PR [[SPARK-14342][Core][Docs][Tests] 
Remove straggler references to 
Tachyon](https://github.com/apache/spark/pull/12129).

So it leaves this PR focus on the semantics update, :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML

2016-04-02 Thread zhengruifeng

Github user zhengruifeng commented on the pull request:

https://github.com/apache/spark/pull/11419#issuecomment-204888043
  
@jkbradley I fix those 2 issues. And I change the output type of 
clusterSizes from Map[Int, Long] to Array[Long]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204886268
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54792/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204886266
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] Issue Error Messages for U...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204885684
  
**[Test build #54792 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54792/consoleFull)**
 for PR 12134 at commit 
[`4178291`](https://github.com/apache/spark/commit/41782913a7c92666110ed5b63f172dba89b44d7d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12030


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204883085
  
LGTM, merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14338][SQL] Improve `SimplifyConditiona...

2016-04-02 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12122#issuecomment-204881448
  
I think I fixed it just now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204879902
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54791/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204879901
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204879875
  
**[Test build #54791 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54791/consoleFull)**
 for PR 12030 at commit 
[`603c2c0`](https://github.com/apache/spark/commit/603c2c08426bc69b04aa1bfd0959a07ccb880839).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204879643
  
cc @yhuai @hvanhovell Code is ready for review. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204879546
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54790/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204879545
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204879516
  
**[Test build #54790 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54790/consoleFull)**
 for PR 12030 at commit 
[`a943159`](https://github.com/apache/spark/commit/a943159455e2dfd56cafaac1d724d76c05b41cc9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204876183
  
The same issue was found for create table:
```SQL
CREATE EXTERNAL TABLE parquet_tab2(c1 INT, c2 STRING)
TBLPROPERTIES('prop1Key '= "prop1Val", ' `prop2Key` '= "prop2Val")
```
Strange errors are got as shown below.
```
scala.collection.immutable.Map$Map2 cannot be cast to 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
java.lang.ClassCastException: scala.collection.immutable.Map$Map2 cannot be 
cast to org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204867976
  
**[Test build #54792 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54792/consoleFull)**
 for PR 12134 at commit 
[`4178291`](https://github.com/apache/spark/commit/41782913a7c92666110ed5b63f172dba89b44d7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204864387
  
@davies Actually, I am not sure if I understood the last comment correctly. 
Would you check the tests please? I added some tests for both merged types and 
not-merged but just inferred types separatly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204864352
  
**[Test build #54791 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54791/consoleFull)**
 for PR 12030 at commit 
[`603c2c0`](https://github.com/apache/spark/commit/603c2c08426bc69b04aa1bfd0959a07ccb880839).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [CORE][SPARK-14178]DAGScheduler should get map...

2016-04-02 Thread witgo

Github user witgo closed the pull request at:

https://github.com/apache/spark/pull/11986


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12030#issuecomment-204864215
  
**[Test build #54790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54790/consoleFull)**
 for PR 12030 at commit 
[`a943159`](https://github.com/apache/spark/commit/a943159455e2dfd56cafaac1d724d76c05b41cc9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204861588
  
In SQL Context, the plan is wrong if we use CREATE VIEW AS SELECT. We need 
to 
```scala
  sql("CREATE VIEW testView AS SELECT * FROM jt").explain(true)
```

```
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `jt`, None

== Analyzed Logical Plan ==
intType: int, stringType: string, dateType: date, timestampType: timestamp, 
doubleType: double, bigintType: bigint, tinyintType: tinyint, decimalType: 
decimal(10,0), fixedDecimalType: decimal(5,1), binaryType: binary, booleanType: 
boolean, smallIntType: smallint, floatType: float, mapType: map, 
arrayType: array, structType: struct
Project 
[intType#155,stringType#156,dateType#157,timestampType#158,doubleType#159,bigintType#160L,tinyintType#161,decimalType#162,fixedDecimalType#163,binaryType#164,booleanType#165,smallIntType#166,floatType#167,mapType#168,arrayType#169,structType#170]
+- SubqueryAlias jt
   +- 
Relation[intType#155,stringType#156,dateType#157,timestampType#158,doubleType#159,bigintType#160L,tinyintType#161,decimalType#162,fixedDecimalType#163,binaryType#164,booleanType#165,smallIntType#166,floatType#167,mapType#168,arrayType#169,structType#170]
 SimpleDDLScan(1,10,test1)

== Optimized Logical Plan ==

Relation[intType#155,stringType#156,dateType#157,timestampType#158,doubleType#159,bigintType#160L,tinyintType#161,decimalType#162,fixedDecimalType#163,binaryType#164,booleanType#165,smallIntType#166,floatType#167,mapType#168,arrayType#169,structType#170]
 SimpleDDLScan(1,10,test1)

== Physical Plan ==
WholeStageCodegen
:  +- Scan 
SimpleDDLScan(1,10,test1)[intType#155,stringType#156,dateType#157,timestampType#158,doubleType#159,bigintType#160L,tinyintType#161,decimalType#162,fixedDecimalType#163,binaryType#164,booleanType#165,smallIntType#166,floatType#167,mapType#168,arrayType#169,structType#170]
```

The expected one should be like
```
== Parsed Logical Plan ==
'CreateViewAsSelect 
CatalogTable(`testView`,CatalogTableType(VIRTUAL_VIEW),CatalogStorageFormat(None,None,None,None,Map()),List(),List(),List(),0,1459654853237,1459654853237,Map(),Some(SELECT
 * FROM jt),Some(SELECT * FROM jt)), false, false, CREATE VIEW testView AS 
SELECT * FROM jt
+- 'Project [*]
   +- 'UnresolvedRelation `jt`, None

== Analyzed Logical Plan ==

CreateViewAsSelect 
CatalogTable(`default`.`testview`,CatalogTableType(VIRTUAL_VIEW),CatalogStorageFormat(None,None,None,None,Map()),List(),List(),List(),0,1459654853237,1459654853237,Map(),Some(SELECT
 * FROM jt),Some(SELECT * FROM jt)), Project 
[intType#0,stringType#1,dateType#2,timestampType#3,doubleType#4,bigintType#5L,tinyintType#6,decimalType#7,fixedDecimalType#8,binaryType#9,booleanType#10,smallIntType#11,floatType#12,mapType#13,arrayType#14,structType#15],
 false, false

== Optimized Logical Plan ==
CreateViewAsSelect 
CatalogTable(`default`.`testview`,CatalogTableType(VIRTUAL_VIEW),CatalogStorageFormat(None,None,None,None,Map()),List(),List(),List(),0,1459654853237,1459654853237,Map(),Some(SELECT
 * FROM jt),Some(SELECT * FROM jt)), Project 
[intType#0,stringType#1,dateType#2,timestampType#3,doubleType#4,bigintType#5L,tinyintType#6,decimalType#7,fixedDecimalType#8,binaryType#9,booleanType#10,smallIntType#11,floatType#12,mapType#13,arrayType#14,structType#15],
 false, false

== Physical Plan ==
ExecutedCommand CreateViewAsSelect 
CatalogTable(`default`.`testview`,CatalogTableType(VIRTUAL_VIEW),CatalogStorageFormat(None,None,None,None,Map()),List(),List(),List(),0,1459654853237,1459654853237,Map(),Some(SELECT
 * FROM jt),Some(SELECT * FROM jt)), Project 
[intType#0,stringType#1,dateType#2,timestampType#3,doubleType#4,bigintType#5L,tinyintType#6,decimalType#7,fixedDecimalType#8,binaryType#9,booleanType#10,smallIntType#11,floatType#12,mapType#13,arrayType#14,structType#15],
 false, false
```

We must disable it in the parser to make sure CREATE VIEW is allowed only 
in Hive Context. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14348][SQL] Support native execution of...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12133#issuecomment-204860955
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54789/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14348][SQL] Support native execution of...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12133#issuecomment-204860953
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14348][SQL] Support native execution of...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12133#issuecomment-204860912
  
**[Test build #54789 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54789/consoleFull)**
 for PR 12133 at commit 
[`6b7bf59`](https://github.com/apache/spark/commit/6b7bf59ab6164a6d9021aeb318baa13fb0d32612).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11810#issuecomment-204857581
  
Thanks for reviewing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11810


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11810#issuecomment-204857532
  
Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11810#issuecomment-204857528
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204856873
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204856875
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54788/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204856731
  
**[Test build #54788 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54788/consoleFull)**
 for PR 12016 at commit 
[`4040e0e`](https://github.com/apache/spark/commit/4040e0ec2421d5abe9b89785955e1e3d2215676e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14338][SQL] Improve `SimplifyConditiona...

2016-04-02 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12122#issuecomment-204855092
  
@dongjoon-hyun Can you take a look at the scala 2.10 build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11876


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11876#issuecomment-204852979
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204848403
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54787/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204848402
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204848365
  
**[Test build #54787 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54787/consoleFull)**
 for PR 12134 at commit 
[`62c814d`](https://github.com/apache/spark/commit/62c814d630b2a1e3aa004942f178628a83a40919).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14348][SQL] Support native execution of...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12133#issuecomment-204847481
  
**[Test build #54789 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54789/consoleFull)**
 for PR 12133 at commit 
[`6b7bf59`](https://github.com/apache/spark/commit/6b7bf59ab6164a6d9021aeb318baa13fb0d32612).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12030#discussion_r58302753
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala
 ---
@@ -214,6 +214,11 @@ private[json] trait TestJsonData {
   """{"a": {"b": 1}}""" ::
   """{"a": []}""" :: Nil)
 
+  def doubleRecords: RDD[String] =
+sqlContext.sparkContext.parallelize(
+  s"""{"a": 1${"0" * 38}, "b": 0.01, "c": 92233720368547758070, "d": 
1.01}""" ::
--- End diff --

Can we have a column that have different schema (range) on different row, 
they got merged together finally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12030#discussion_r58302741
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -773,6 +773,45 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 )
   }
 
+  test("Infer big integers as doubles when it does not fit in decimal") {
+val jsonDF = sqlContext.read
+  .json(doubleRecords)
+  .selectExpr("a", "c")
+
+// The values in `a` field will be doubles as they all do not fit in 
decimal. For `c` field,
+// it will be also doubles as `9.223372036854776E19` can be a decimal 
but `2.0E38` becomes
+// a double as it does not fit in decimal. It makes the type as double 
in this case.
+val expectedSchema = StructType(
+  StructField("a", DoubleType, true) ::
+  StructField("c", DoubleType, true):: Nil)
+
+assert(expectedSchema === jsonDF.schema)
+checkAnswer(
+  jsonDF,
+  Seq(Row(1.0E38D, 9.223372036854776E19), Row(2.0E38D, 2.0E38D))
+)
+  }
+
+  test("Infer floating-point values as doubles when it does not fit in 
decimal") {
+val jsonDF = sqlContext.read
+  .option("prefersDecimal", "true")
+  .json(doubleRecords)
+  .selectExpr("b", "d")
--- End diff --

It's better to check the schema of all columns 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11876#issuecomment-204847005
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11876#issuecomment-204847006
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54784/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11876#issuecomment-204846967
  
**[Test build #54784 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54784/consoleFull)**
 for PR 11876 at commit 
[`98eee85`](https://github.com/apache/spark/commit/98eee85d388eac799a1bc06b67d238d3fe60e933).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12117#issuecomment-204846056
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12117#issuecomment-204846060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54785/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12117#issuecomment-204845983
  
**[Test build #54785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54785/consoleFull)**
 for PR 12117 at commit 
[`3718e61`](https://github.com/apache/spark/commit/3718e613498cbf9a996f52bc3f215f1051f2ae51).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedGenerator(name: String, children: 
Seq[Expression]) extends Generator `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11810#issuecomment-204845229
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54786/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11810#issuecomment-204845226
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11810#issuecomment-204844841
  
**[Test build #54786 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54786/consoleFull)**
 for PR 11810 at commit 
[`0c7afe7`](https://github.com/apache/spark/commit/0c7afe751e087b5de3b55ac40d189e10a3742776).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14231][SQL] JSON data source infers flo...

2016-04-02 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12030#discussion_r58302585
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala
 ---
@@ -135,11 +135,25 @@ private[sql] object InferSchema {
   // when we see a Java BigInteger, we use DecimalType.
   case BIG_INTEGER | BIG_DECIMAL =>
 val v = parser.getDecimalValue
-DecimalType(v.precision(), v.scale())
+try {
+  // Creating `DecimalType` here can fail when precision is 
bigger than 38.
+  DecimalType(v.precision(), v.scale())
--- End diff --

It's reasonable to have a decimal "0.01", so we should use max of precision 
and scale as precision here. Also it's better  to not reply on exception for 
branch.

if (Math.max(v.precision(), v.scale()) <= DecimalType.MAX_MAX_PRECISION) {
  DecimalType(Math.max(v.precision(), v.scale()), v.scale())
} else {
  DoubleType
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14341] [SQL] Throw exception on unsuppo...

2016-04-02 Thread bomeng

Github user bomeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12125#discussion_r58302569
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -759,6 +761,7 @@ SNAPSHOT: 'SNAPSHOT';
 READ: 'READ';
 WRITE: 'WRITE';
 ONLY: 'ONLY';
+MACRO: 'MACRO';
--- End diff --

Just wanna confirm. I thought MACRO is a reseved keyword. Is it right? 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...

2016-04-02 Thread xguo27

Github user xguo27 closed the pull request at:

https://github.com/apache/spark/pull/10935


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...

2016-04-02 Thread xguo27

Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10935#issuecomment-204840629
  
Sure @davies . I will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14342][Core][Docs][Tests] Remove stragg...

2016-04-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12129


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14342][Core][Docs][Tests] Remove stragg...

2016-04-02 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12129#issuecomment-204839111
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Use multi-line JavaDoc comments ...

2016-04-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12130


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Use multi-line JavaDoc comments ...

2016-04-02 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12130#issuecomment-204838383
  
BTW in the future for changes this size we should create a JIRA ticket.

Can you look into whether it is possible to create a scalastyle rule for 
this so we don't do the scaladoc style docs in the future?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14338][SQL] Improve `SimplifyConditiona...

2016-04-02 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12122


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Use multi-line JavaDoc comments ...

2016-04-02 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12130#issuecomment-204838369
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14338][SQL] Improve `SimplifyConditiona...

2016-04-02 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12122#issuecomment-204838315
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14285][SQL] Implement common type-safe ...

2016-04-02 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12077#discussion_r58302439
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/typedaggregators.scala
 ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.aggregate
+
+import org.apache.spark.sql.expressions.Aggregator
+

+
+// This file defines internal implementations for aggregators.

+
+
+
+class TypedSum[IN, OUT : Numeric](f: IN => OUT) extends Aggregator[IN, 
OUT, OUT] {
+  val numeric = implicitly[Numeric[OUT]]
+  override def zero: OUT = numeric.zero
+  override def reduce(b: OUT, a: IN): OUT = numeric.plus(b, f(a))
+  override def merge(b1: OUT, b2: OUT): OUT = numeric.plus(b1, b2)
+  override def finish(reduction: OUT): OUT = reduction
+}
+
+
+class TypedSumDouble[IN](f: IN => Double) extends Aggregator[IN, Double, 
Double] {
+  override def zero: Double = 0.0
+  override def reduce(b: Double, a: IN): Double = b + f(a)
+  override def merge(b1: Double, b2: Double): Double = b1 + b2
+  override def finish(reduction: Double): Double = reduction
+}
+
+
+class TypedSumLong[IN](f: IN => Long) extends Aggregator[IN, Long, Long] {
--- End diff --

How often do you care about summing up shorts and ints but don't want to 
automatically coerce into longs?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204832859
  
**[Test build #54788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54788/consoleFull)**
 for PR 12016 at commit 
[`4040e0e`](https://github.com/apache/spark/commit/4040e0ec2421d5abe9b89785955e1e3d2215676e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14050][ML] Add multiple languages suppo...

2016-04-02 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11871#discussion_r58302144
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -123,21 +71,26 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> StopWords.English, caseSensitive -> false)
+  setDefault(stopWords -> Array.empty[String], caseSensitive -> false)
 
   override def transform(dataset: DataFrame): DataFrame = {
+val stopWordsSet = if ($(stopWords).isEmpty) {
+  StopWordsRemover.loadStopWords("english").toSet
+} else {
+  $(stopWords).toSet
+}
+
 val outputSchema = transformSchema(dataset.schema)
 val t = if ($(caseSensitive)) {
-val stopWordsSet = $(stopWords).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !stopWordsSet.contains(s))
-}
-  } else {
-val toLower = (s: String) => if (s != null) s.toLowerCase else s
-val lowerStopWords = $(stopWords).map(toLower(_)).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !lowerStopWords.contains(toLower(s)))
-}
+  udf { terms: Seq[String] =>
+terms.filter(s => !stopWordsSet.contains(s))
+  }
+} else {
+  val toLower = (s: String) => if (s != null) s.toLowerCase else s
+  val lowerStopWords = stopWordsSet.map(toLower(_)).toSet
--- End diff --

OK, if we don't treat that as an error, then null can be filtered when the 
stopwords set is created. In fact it can be lowercased at that time too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204832447
  
Looks good, thank you. Let's give it one more spin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204832448
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12134#issuecomment-204832449
  
**[Test build #54787 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54787/consoleFull)**
 for PR 12134 at commit 
[`62c814d`](https://github.com/apache/spark/commit/62c814d630b2a1e3aa004942f178628a83a40919).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14349] [SQL] [WIP] Issue Error Messages...

2016-04-02 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/12134

[SPARK-14349] [SQL] [WIP] Issue Error Messages for Unsupported 
Operators/DML/DDL in SQL Context.

 What changes were proposed in this pull request?

Currently, the weird error messages are issued if we use Hive Context-only 
operations in SQL Context. 

For example, 
- When calling `Drop Table` in SQL Context, we got the following message:
```
Expected exception org.apache.spark.sql.catalyst.parser.ParseException to 
be thrown, but java.lang.ClassCastException was thrown.
```

- When calling `Script Transform` in SQL Context, we got the message:
```
assertion failed: No plan for ScriptTransformation [key#9,value#10], cat, 
[tKey#155,tValue#156], null
+- LogicalRDD [key#9,value#10], MapPartitionsRDD[3] at beforeAll at 
BeforeAndAfterAll.scala:187
```

 How was this patch tested?
Two test cases are added. 

Not sure if the same issue exist for the other operators/DDL/DML. 
@hvanhovell 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark hiveParserCommand

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12134


commit e09ac7c7981ff8ee405978b019abca1a6a0168cb
Author: gatorsmile 
Date:   2016-04-02T23:51:07Z

unsupported operations in SQL Context.

commit 62c814d630b2a1e3aa004942f178628a83a40919
Author: gatorsmile 
Date:   2016-04-03T00:10:45Z

style fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread mtustin-handy

Github user mtustin-handy commented on a diff in the pull request:

https://github.com/apache/spark/pull/12016#discussion_r58302077
  
--- Diff: core/src/main/scala/org/apache/spark/partial/BoundedDouble.scala 
---
@@ -21,5 +21,23 @@ package org.apache.spark.partial
  * A Double value with error bars and associated confidence.
  */
 class BoundedDouble(val mean: Double, val confidence: Double, val low: 
Double, val high: Double) {
-  override def toString(): String = "[%.3f, %.3f]".format(low, high)
+  override def toString(): String =
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11810#issuecomment-204827883
  
**[Test build #54786 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54786/consoleFull)**
 for PR 11810 at commit 
[`0c7afe7`](https://github.com/apache/spark/commit/0c7afe751e087b5de3b55ac40d189e10a3742776).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13996][SQL] Add more not null attribute...

2016-04-02 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11810#issuecomment-204827703
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12117#issuecomment-204827237
  
**[Test build #54785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54785/consoleFull)**
 for PR 12117 at commit 
[`3718e61`](https://github.com/apache/spark/commit/3718e613498cbf9a996f52bc3f215f1051f2ae51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12117#discussion_r58301786
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -428,53 +432,86 @@ class SessionCatalog(
*/
   def dropFunction(name: FunctionIdentifier): Unit = {
 val db = name.database.getOrElse(currentDb)
+val qualified = name.copy(database = Some(db)).unquotedString
+if (functionRegistry.functionExists(qualified)) {
+  // If we have loaded this function into FunctionRegistry,
+  // also drop it from there.
+  functionRegistry.dropFunction(qualified)
+}
 externalCatalog.dropFunction(db, name.funcName)
   }
 
   /**
-   * Alter a metastore function whose name that matches the one specified 
in `funcDefinition`.
-   *
-   * If no database is specified in `funcDefinition`, assume the function 
is in the
-   * current database.
-   *
-   * Note: If the underlying implementation does not support altering a 
certain field,
-   * this becomes a no-op.
-   */
-  def alterFunction(funcDefinition: CatalogFunction): Unit = {
-val db = funcDefinition.identifier.database.getOrElse(currentDb)
-val newFuncDefinition = funcDefinition.copy(
-  identifier = FunctionIdentifier(funcDefinition.identifier.funcName, 
Some(db)))
-externalCatalog.alterFunction(db, newFuncDefinition)
-  }
--- End diff --

Users cannot alter functions (no API exposed). So, I just delete it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12117#discussion_r58301788
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -491,40 +528,31 @@ class SessionCatalog(
   }
 
   /**
-   * Rename a function.
-   *
-   * If a database is specified in `oldName`, this will rename the 
function in that database.
-   * If no database is specified, this will first attempt to rename a 
temporary function with
-   * the same name, then, if that does not exist, rename the function in 
the current database.
-   *
-   * This assumes the database specified in `oldName` matches the one 
specified in `newName`.
-   */
-  def renameFunction(oldName: FunctionIdentifier, newName: 
FunctionIdentifier): Unit = {
--- End diff --

Users cannot rename a function (no API exposed). So, I just delete it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12117#discussion_r58301771
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala ---
@@ -112,4 +124,121 @@ class HiveSessionCatalog(
 metastoreCatalog.cachedDataSourceTables.getIfPresent(key)
   }
 
+  override def makeFunctionBuilder(funcName: String, className: String): 
FunctionBuilder = {
+makeFunctionBuilder(funcName, Utils.classForName(className))
+  }
+
+  /**
+   * Construct a [[FunctionBuilder]] based on the provided class that 
represents a function.
+   */
+  private def makeFunctionBuilder(name: String, clazz: Class[_]): 
FunctionBuilder = {
+// When we instantiate hive UDF wrapper class, we may throw exception 
if the input
+// expressions don't satisfy the hive UDF, such as type mismatch, 
input number
+// mismatch, etc. Here we catch the exception and throw 
AnalysisException instead.
+(children: Seq[Expression]) => {
+  try {
+if (classOf[UDF].isAssignableFrom(clazz)) {
+  val udf = HiveSimpleUDF(name, new 
HiveFunctionWrapper(clazz.getName), children)
+  udf.dataType // Force it to check input data types.
+  udf
+} else if (classOf[GenericUDF].isAssignableFrom(clazz)) {
+  val udf = HiveGenericUDF(name, new 
HiveFunctionWrapper(clazz.getName), children)
+  udf.dataType // Force it to check input data types.
+  udf
+} else if 
(classOf[AbstractGenericUDAFResolver].isAssignableFrom(clazz)) {
+  val udaf = HiveUDAFFunction(name, new 
HiveFunctionWrapper(clazz.getName), children)
+  udaf.dataType // Force it to check input data types.
+  udaf
+} else if (classOf[UDAF].isAssignableFrom(clazz)) {
+  val udaf = HiveUDAFFunction(
+name,
+new HiveFunctionWrapper(clazz.getName),
+children,
+isUDAFBridgeRequired = true)
+  udaf.dataType  // Force it to check input data types.
+  udaf
+} else if (classOf[GenericUDTF].isAssignableFrom(clazz)) {
+  val udtf = HiveGenericUDTF(name, new 
HiveFunctionWrapper(clazz.getName), children)
+  udtf.elementTypes // Force it to check input data types.
+  udtf
+} else {
+  throw new AnalysisException(s"No handler for Hive UDF 
'${clazz.getCanonicalName}'")
+}
+  } catch {
+case ae: AnalysisException =>
+  throw ae
+case NonFatal(e) =>
+  val analysisException =
+new AnalysisException(s"No handler for Hive UDF 
'${clazz.getCanonicalName}': $e")
+  analysisException.setStackTrace(e.getStackTrace)
+  throw analysisException
+  }
+}
+  }
+
+  // We have a list of Hive built-in functions that we do not support. So, 
we will check
+  // Hive's function registry and lazily load needed functions into our 
own function registry.
+  // Those Hive built-in functions are
+  // assert_true, collect_list, collect_set, compute_stats, 
context_ngrams, create_union,
+  // current_user ,elt, ewah_bitmap, ewah_bitmap_and, ewah_bitmap_empty, 
ewah_bitmap_or, field,
+  // histogram_numeric, in_file, index, inline, java_method, map_keys, 
map_values,
+  // matchpath, ngrams, noop, noopstreaming, noopwithmap, 
noopwithmapstreaming,
+  // parse_url, parse_url_tuple, percentile, percentile_approx, 
posexplode, reflect, reflect2,
+  // regexp, sentences, stack, std, str_to_map, windowingtablefunction, 
xpath, xpath_boolean,
+  // xpath_double, xpath_float, xpath_int, xpath_long, xpath_number,
+  // xpath_short, and xpath_string.
+  override def lookupFunction(name: String, children: Seq[Expression]): 
Expression = {
+Try(super.lookupFunction(name, children)) match {
+  case Success(expr) => expr
+  case Failure(error) =>
+if (functionRegistry.functionExists(name)) {
+  // If the function actually exists in functionRegistry, it means 
that there is an
+  // error when we create the Expression using the given children.
+  // We need to throw the original exception.
+  throw error
--- End diff --

When there is a builder in the function registry, 
`super.lookupFunction(name, children)` can still fail (for example, when we try 
to create an expression for a hive builtin function but arguments are not 
valid).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastr

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12117#discussion_r58301749
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -211,10 +241,11 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 checkExistence(sql("describe functioN abcadf"), true,
   "Function: abcadf not found.")
 
-checkExistence(sql("describe functioN  `~`"), true,
-  "Function: ~",
-  "Class: org.apache.hadoop.hive.ql.udf.UDFOPBitNot",
-  "Usage: ~ n - Bitwise not")
+// TODO: Re-enable this test after we fix SPARK-14335.
--- End diff --

The output of this test shows that we are loading a hive function. But, we 
actually have implementation of `~`. With this PR, we will lazily load hive's 
built-in functions and `describe` command will not trigger the load. So, this 
test will fail with a message like `Undefined function`. SPARK-14335 will fix 
this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread mtustin-handy

Github user mtustin-handy commented on a diff in the pull request:

https://github.com/apache/spark/pull/12016#discussion_r58301694
  
--- Diff: core/src/main/scala/org/apache/spark/partial/BoundedDouble.scala 
---
@@ -21,5 +21,23 @@ package org.apache.spark.partial
  * A Double value with error bars and associated confidence.
  */
 class BoundedDouble(val mean: Double, val confidence: Double, val low: 
Double, val high: Double) {
-  override def toString(): String = "[%.3f, %.3f]".format(low, high)
+  override def toString(): String =
--- End diff --

I definitely can put it back, but the previous toString was just weird - it
only printed the bounds. Anyway, I'll update this in a sec (to go back).
Let me know if you change your mind.

On Saturday, April 2, 2016, Sean Owen  wrote:

> In core/src/main/scala/org/apache/spark/partial/BoundedDouble.scala
> :
>
> > @@ -21,5 +21,23 @@ package org.apache.spark.partial
> >   * A Double value with error bars and associated confidence.
> >   */
> >  class BoundedDouble(val mean: Double, val confidence: Double, val low: 
Double, val high: Double) {
> > -  override def toString(): String = "[%.3f, %.3f]".format(low, high)
> > +  override def toString(): String =
>
> OK, I think this is all good, except I think the toString should be left
> alone. I forgot to mention this. Not that I really expect anyone to depend
> on the format, but let's leave it since it's a public class.
>
> â
> You are receiving this because you authored the thread.
> Reply to this email directly or view it on GitHub
> 

>

-- 
Want to work at Handy? Check out our culture deck and open roles 

Latest news  at Handy
Handy just raised $50m 


 led 
by Fidelity




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14050][ML] Add multiple languages suppo...

2016-04-02 Thread burakkose

Github user burakkose commented on a diff in the pull request:

https://github.com/apache/spark/pull/11871#discussion_r58301673
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -123,21 +71,26 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> StopWords.English, caseSensitive -> false)
+  setDefault(stopWords -> Array.empty[String], caseSensitive -> false)
 
   override def transform(dataset: DataFrame): DataFrame = {
+val stopWordsSet = if ($(stopWords).isEmpty) {
+  StopWordsRemover.loadStopWords("english").toSet
+} else {
+  $(stopWords).toSet
+}
+
 val outputSchema = transformSchema(dataset.schema)
 val t = if ($(caseSensitive)) {
-val stopWordsSet = $(stopWords).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !stopWordsSet.contains(s))
-}
-  } else {
-val toLower = (s: String) => if (s != null) s.toLowerCase else s
-val lowerStopWords = $(stopWords).map(toLower(_)).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !lowerStopWords.contains(toLower(s)))
-}
+  udf { terms: Seq[String] =>
+terms.filter(s => !stopWordsSet.contains(s))
+  }
+} else {
+  val toLower = (s: String) => if (s != null) s.toLowerCase else s
+  val lowerStopWords = stopWordsSet.map(toLower(_)).toSet
--- End diff --

Before my editing, they wrote that condition. I thought as you said. 
However, user may do that.
```
//Other operations to assign to the word. Just an example
val word: String = null
val stopwords = Array(word)
val remover = new StopWordsRemover()
  .setInputCol("raw")
  .setOutputCol("filtered")
  .setStopWords(stopWords)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12016#discussion_r58301669
  
--- Diff: core/src/main/scala/org/apache/spark/partial/BoundedDouble.scala 
---
@@ -21,5 +21,23 @@ package org.apache.spark.partial
  * A Double value with error bars and associated confidence.
  */
 class BoundedDouble(val mean: Double, val confidence: Double, val low: 
Double, val high: Double) {
-  override def toString(): String = "[%.3f, %.3f]".format(low, high)
+  override def toString(): String =
--- End diff --

OK, I think this is all good, except I think the `toString` should be left 
alone. I forgot to mention this. Not that I really expect anyone to depend on 
the format, but let's leave it since it's a public class. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11876#issuecomment-204819781
  
**[Test build #54784 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54784/consoleFull)**
 for PR 11876 at commit 
[`98eee85`](https://github.com/apache/spark/commit/98eee85d388eac799a1bc06b67d238d3fe60e933).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14050][ML] Add multiple languages suppo...

2016-04-02 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11871#discussion_r58301587
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -123,21 +71,26 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> StopWords.English, caseSensitive -> false)
+  setDefault(stopWords -> Array.empty[String], caseSensitive -> false)
 
   override def transform(dataset: DataFrame): DataFrame = {
+val stopWordsSet = if ($(stopWords).isEmpty) {
+  StopWordsRemover.loadStopWords("english").toSet
+} else {
+  $(stopWords).toSet
+}
+
 val outputSchema = transformSchema(dataset.schema)
 val t = if ($(caseSensitive)) {
-val stopWordsSet = $(stopWords).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !stopWordsSet.contains(s))
-}
-  } else {
-val toLower = (s: String) => if (s != null) s.toLowerCase else s
-val lowerStopWords = $(stopWords).map(toLower(_)).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !lowerStopWords.contains(toLower(s)))
-}
+  udf { terms: Seq[String] =>
+terms.filter(s => !stopWordsSet.contains(s))
--- End diff --

See question below about null words


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12117#discussion_r58301583
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -453,28 +464,80 @@ class SessionCatalog(
* If a database is specified in `name`, this will return the function 
in that database.
* If no database is specified, this will return the function in the 
current database.
*/
+  // TODO: have a better name. This method is actually for fetching the 
metadata of a function.
   def getFunction(name: FunctionIdentifier): CatalogFunction = {
 val db = name.database.getOrElse(currentDb)
 externalCatalog.getFunction(db, name.funcName)
   }
 
+  /**
+   * Check if a function is already existing.
+   *
+   */
+  def functionExists(name: FunctionIdentifier): Boolean = {
+if (functionRegistry.functionExists(name.unquotedString)) {
+  // This function exists in the FunctionRegistry.
+  true
+} else {
+  // Need to check if this function exists in the metastore.
+  try {
+getFunction(name) != null
+  } catch {
+case _: NoSuchFunctionException => false
+case _: AnalysisException => false // HiveExternalCatalog wraps 
all exceptions with it.
+  }
+}
+  }
 
   // 
   // | Methods that interact with temporary and metastore functions |
   // 
 
+
+  /**
+   * Return a temporary function. For testing only.
+   */
+  private[catalog] def getTempFunction(name: String): 
Option[FunctionBuilder] = {
+// TODO: Why do we need this?
--- End diff --

Will delete it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11876#issuecomment-204819086
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14050][ML] Add multiple languages suppo...

2016-04-02 Thread burakkose

Github user burakkose commented on a diff in the pull request:

https://github.com/apache/spark/pull/11871#discussion_r58301543
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -123,21 +71,26 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> StopWords.English, caseSensitive -> false)
+  setDefault(stopWords -> Array.empty[String], caseSensitive -> false)
 
   override def transform(dataset: DataFrame): DataFrame = {
+val stopWordsSet = if ($(stopWords).isEmpty) {
+  StopWordsRemover.loadStopWords("english").toSet
+} else {
+  $(stopWords).toSet
+}
+
 val outputSchema = transformSchema(dataset.schema)
 val t = if ($(caseSensitive)) {
-val stopWordsSet = $(stopWords).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !stopWordsSet.contains(s))
-}
-  } else {
-val toLower = (s: String) => if (s != null) s.toLowerCase else s
-val lowerStopWords = $(stopWords).map(toLower(_)).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !lowerStopWords.contains(toLower(s)))
-}
+  udf { terms: Seq[String] =>
+terms.filter(s => !stopWordsSet.contains(s))
--- End diff --

Yes, I will fix. Do you have any additional suggestions about the 
pull-request, such as additional features?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread sitalkedia

Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/11876#issuecomment-204817972
  
@srowen - Thanks for taking a look, updated the diff to fix the test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14056] Appends s3 specific configuratio...

2016-04-02 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/11876#discussion_r58301462
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -74,13 +74,12 @@ class SparkHadoopUtil extends Logging {
 }
   }
 
-  /**
-   * Return an appropriate (subclass) of Configuration. Creating config 
can initializes some Hadoop
-   * subsystems.
-   */
-  def newConfiguration(conf: SparkConf): Configuration = {
-val hadoopConf = new Configuration()
 
+  /**
+* Appends S3-specific, spark.hadoop.*, and spark.spark.buffer.size 
configurations to a Hadoop
--- End diff --

@marmbrus - the job run a simple hive query to create a table. From what I 
understand from the code is the hiveConf is being initialized which is does not 
include spark.hadoo.* configurations and that hiveConf is being used to 
initialized the HadoopRDD. So the HadoopRDD does not contain any spark.hadoop.* 
configurations. This fix is meant to resolve that issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14050][ML] Add multiple languages suppo...

2016-04-02 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11871#discussion_r58301366
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -123,21 +71,26 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> StopWords.English, caseSensitive -> false)
+  setDefault(stopWords -> Array.empty[String], caseSensitive -> false)
 
   override def transform(dataset: DataFrame): DataFrame = {
+val stopWordsSet = if ($(stopWords).isEmpty) {
+  StopWordsRemover.loadStopWords("english").toSet
+} else {
+  $(stopWords).toSet
+}
+
 val outputSchema = transformSchema(dataset.schema)
 val t = if ($(caseSensitive)) {
-val stopWordsSet = $(stopWords).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !stopWordsSet.contains(s))
-}
-  } else {
-val toLower = (s: String) => if (s != null) s.toLowerCase else s
-val lowerStopWords = $(stopWords).map(toLower(_)).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !lowerStopWords.contains(toLower(s)))
-}
+  udf { terms: Seq[String] =>
+terms.filter(s => !stopWordsSet.contains(s))
--- End diff --

Can you save a reference to the active set of stopwords instead of making 
the list into a set each time? might be more natural to have a defensive copy 
anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][DOCS] Use multi-line JavaDoc comments ...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12130#issuecomment-204814623
  
**[Test build #2734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2734/consoleFull)**
 for PR 12130 at commit 
[`51ab3ba`](https://github.com/apache/spark/commit/51ab3ba622120277e7c56c75f0002e5d529d2d06).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204813542
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12117#discussion_r58301126
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -133,6 +133,30 @@ object UnresolvedAttribute {
   }
 }
 
+case class UnresolvedGenerator(
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204813543
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54782/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12117#discussion_r58301096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -67,9 +70,14 @@ class SimpleFunctionRegistry extends FunctionRegistry {
   }
 
   override def lookupFunction(name: String, children: Seq[Expression]): 
Expression = {
+val builder = functionBuilders.get(name)
+if (builder.isEmpty) {
+  throw new AnalysisException(s"undefined function $name")
+}
 val func = synchronized {
-  functionBuilders.get(name).map(_._2).getOrElse {
-throw new AnalysisException(s"undefined function $name")
+  Try(builder.map(_._2)) match {
--- End diff --

Will revert them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12016#issuecomment-204813484
  
**[Test build #54782 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54782/consoleFull)**
 for PR 12016 at commit 
[`5e3c477`](https://github.com/apache/spark/commit/5e3c47762f79b89544360c383db10b3d77411109).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14123] [WIP] [SQL] Handle CreateFunctio...

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12117#discussion_r58301098
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -334,17 +342,18 @@ object FunctionRegistry {
 
   val builtin: SimpleFunctionRegistry = {
 val fr = new SimpleFunctionRegistry
-expressions.foreach { case (name, (info, builder)) => 
fr.registerFunction(name, info, builder) }
+expressions.foreach {
+  case (name, (info, builder)) => fr.registerFunction(name, info, 
builder)
--- End diff --

Will revert most of them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14348][SQL] Support native execution of...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12133#issuecomment-204813187
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14348][SQL] Support native execution of...

2016-04-02 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12133#issuecomment-204813188
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54783/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14348][SQL] Support native execution of...

2016-04-02 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12133#issuecomment-204812704
  
**[Test build #54783 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54783/consoleFull)**
 for PR 12133 at commit 
[`386f492`](https://github.com/apache/spark/commit/386f492533199a4ed35d873d24438a9e83299160).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ShowTablePropertiesCommand(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14129][SQL] Alter table DDL commands

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12121#discussion_r58301016
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -195,67 +195,133 @@ case class DropFunction(
 isTemp: Boolean)(sql: String)
   extends NativeDDLCommand(sql) with Logging
 
-/** Rename in ALTER TABLE/VIEW: change the name of a table/view to a 
different name. */
+/**
+ * A command that renames a table/view.
+ *
+ * The syntax of this command is:
+ * {{{
+ *ALTER TABLE table1 RENAME TO table2;
+ *ALTER VIEW view1 RENAME TO view2;
+ * }}}
+ */
 case class AlterTableRename(
 oldName: TableIdentifier,
-newName: TableIdentifier)(sql: String)
-  extends NativeDDLCommand(sql) with Logging
+newName: TableIdentifier)
+  extends RunnableCommand {
+
+  override def run(sqlContext: SQLContext): Seq[Row] = {
+sqlContext.sessionState.catalog.renameTable(oldName, newName)
+Seq.empty[Row]
+  }
 
-/** Set Properties in ALTER TABLE/VIEW: add metadata to a table/view. */
+}
+
+/**
+ * A command that sets table/view properties.
+ *
+ * The syntax of this command is:
+ * {{{
+ *   ALTER TABLE table1 SET TBLPROPERTIES ('key1' = 'val1', 'key2' = 
'val2', ...);
+ *   ALTER VIEW view1 SET TBLPROPERTIES ('key1' = 'val1', 'key2' = 'val2', 
...);
+ * }}}
+ */
 case class AlterTableSetProperties(
 tableName: TableIdentifier,
-properties: Map[String, String])(sql: String)
-  extends NativeDDLCommand(sql) with Logging
+properties: Map[String, String])
+  extends RunnableCommand {
 
-/** Unset Properties in ALTER TABLE/VIEW: remove metadata from a 
table/view. */
+  override def run(sqlContext: SQLContext): Seq[Row] = {
+val catalog = sqlContext.sessionState.catalog
+val table = catalog.getTable(tableName)
+val newProperties = table.properties ++ properties
+// TODO: make this a constant
+if (newProperties.contains("spark.sql.sources.provider")) {
+  throw new AnalysisException(
+"alter table properties is not supported for datasource tables")
+}
--- End diff --

Thinking about the error message, users probably do not really what are 
data source tables and what are hive tables. So, how about we just say 
something like "it is not allowed to modify table properties."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14050][ML] Add multiple languages suppo...

2016-04-02 Thread burakkose

Github user burakkose commented on a diff in the pull request:

https://github.com/apache/spark/pull/11871#discussion_r58300978
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
@@ -123,21 +71,26 @@ class StopWordsRemover(override val uid: String)
   /** @group getParam */
   def getCaseSensitive: Boolean = $(caseSensitive)
 
-  setDefault(stopWords -> StopWords.English, caseSensitive -> false)
+  setDefault(stopWords -> Array.empty[String], caseSensitive -> false)
 
   override def transform(dataset: DataFrame): DataFrame = {
+val stopWordsSet = if ($(stopWords).isEmpty) {
+  StopWordsRemover.loadStopWords("english").toSet
+} else {
+  $(stopWords).toSet
+}
+
 val outputSchema = transformSchema(dataset.schema)
 val t = if ($(caseSensitive)) {
-val stopWordsSet = $(stopWords).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !stopWordsSet.contains(s))
-}
-  } else {
-val toLower = (s: String) => if (s != null) s.toLowerCase else s
-val lowerStopWords = $(stopWords).map(toLower(_)).toSet
-udf { terms: Seq[String] =>
-  terms.filter(s => !lowerStopWords.contains(toLower(s)))
-}
+  udf { terms: Seq[String] =>
+terms.filter(s => !stopWordsSet.contains(s))
--- End diff --

Can you give more information about that case. What is the best way for you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14129][SQL] Alter table DDL commands

2016-04-02 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12121#discussion_r58300971
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -195,67 +195,133 @@ case class DropFunction(
 isTemp: Boolean)(sql: String)
   extends NativeDDLCommand(sql) with Logging
 
-/** Rename in ALTER TABLE/VIEW: change the name of a table/view to a 
different name. */
+/**
+ * A command that renames a table/view.
+ *
+ * The syntax of this command is:
+ * {{{
+ *ALTER TABLE table1 RENAME TO table2;
+ *ALTER VIEW view1 RENAME TO view2;
+ * }}}
+ */
 case class AlterTableRename(
 oldName: TableIdentifier,
-newName: TableIdentifier)(sql: String)
-  extends NativeDDLCommand(sql) with Logging
+newName: TableIdentifier)
+  extends RunnableCommand {
+
+  override def run(sqlContext: SQLContext): Seq[Row] = {
+sqlContext.sessionState.catalog.renameTable(oldName, newName)
+Seq.empty[Row]
+  }
 
-/** Set Properties in ALTER TABLE/VIEW: add metadata to a table/view. */
+}
+
+/**
+ * A command that sets table/view properties.
+ *
+ * The syntax of this command is:
+ * {{{
+ *   ALTER TABLE table1 SET TBLPROPERTIES ('key1' = 'val1', 'key2' = 
'val2', ...);
+ *   ALTER VIEW view1 SET TBLPROPERTIES ('key1' = 'val1', 'key2' = 'val2', 
...);
+ * }}}
+ */
 case class AlterTableSetProperties(
 tableName: TableIdentifier,
-properties: Map[String, String])(sql: String)
-  extends NativeDDLCommand(sql) with Logging
+properties: Map[String, String])
+  extends RunnableCommand {
 
-/** Unset Properties in ALTER TABLE/VIEW: remove metadata from a 
table/view. */
+  override def run(sqlContext: SQLContext): Seq[Row] = {
+val catalog = sqlContext.sessionState.catalog
+val table = catalog.getTable(tableName)
+val newProperties = table.properties ++ properties
+// TODO: make this a constant
+if (newProperties.contains("spark.sql.sources.provider")) {
+  throw new AnalysisException(
+"alter table properties is not supported for datasource tables")
+}
--- End diff --

Can we have a test to make sure that we can drop a data source table with 
malformed metadata (e.g. provider does not exist and values of a data source 
option are not valid)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 266 matches

Mail list logo