date:20161225

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16296
  
Just read the design doc. What is the decision about adding `(TBLPROPERTIES 
tablePropertyList)?` in the CREATE TABLE syntax? So far, .g4 file does not have 
it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16296
  
There is a syntax difference in partition column definition between Hive 
serde tables and data source tables.  In Hive serde tables, the partitioning 
columns cannot be part of the table schema. Do we need to document the 
difference, or we can assume users understand this when they convert it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16296
  
We might need another PR for updating the output of `SHOW CREATE TABLE`, 
since we recommend users use the new syntax.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16296
  
`CREATE TEMPORARY TABLE` is not supported for all the types of hive serde 
tables. However, in `CREATE TEMPORARY TABLE` is allowed for creating data souce 
tables if `AS query` is not specified. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16296
  
Hive does not allow to use a CTAS statement to create a partitioned table, 
but we allow it in the Create Data Source table syntax. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16391: [SPARK-18990][SQL] make DatasetBenchmark fairer f...

2016-12-25 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16391#discussion_r93826236
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -170,36 +176,39 @@ object DatasetBenchmark {
 val benchmark3 = aggregate(spark, numRows)
 
 /*
-OpenJDK 64-Bit Server VM 1.8.0_91-b14 on Linux 
3.10.0-327.18.2.el7.x86_64
-Intel Xeon E3-12xx v2 (Ivy Bridge)
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.12.1
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+
 back-to-back map:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
 

-RDD   3448 / 3646 29.0 
 34.5   1.0X
-DataFrame 2647 / 3116 37.8 
 26.5   1.3X
-Dataset   4781 / 5155 20.9 
 47.8   0.7X
+RDD   3963 / 3976 25.2 
 39.6   1.0X
+DataFrame  826 /  834121.1 
  8.3   4.8X
+Dataset   5178 / 5198 19.3 
 51.8   0.8X
--- End diff --

I noticed that Scala compiler automatically generates primitive version. 
Current Spark eventually calls primitive version thru generic version `Object 
apply(Object)`.

Here is a simple example. When we compile the following sample, we can find 
that the following class is generated by scalac. Scalac automatically generates 
a primitive version `int apply$mcII$sp(int)` that can be called by `int 
apply(int)`.  
We could infer this signature in Catalyst for simple cases.

Of course, I totally agree that the best solution is to analyze byte code 
and turn it into expression. [This 
](https://issues.apache.org/jira/browse/SPARK-14083)was already prototyped. Do 
you think it is good time to make this prototype more robust now?


```java
test("ds") {
  val ds = sparkContext.parallelize((1 to 10), 1).toDS
  ds.map(i => i * 7).show
}

$ javap -c Test\$\$anonfun\$5\$\$anonfun\$apply\$mcV\$sp\$1.class
Compiled from "Test.scala"
public final class 
org.apache.spark.sql.Test$$anonfun$5$$anonfun$apply$mcV$sp$1 extends 
scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
  public static final long serialVersionUID;

  public final int apply(int);
Code:
   0: aload_0
   1: iload_1
   2: invokevirtual #18 // Method apply$mcII$sp:(I)I
   5: ireturn

  public int apply$mcII$sp(int);
Code:
   0: iload_1
   1: bipush7
   3: imul
   4: ireturn

  public final java.lang.Object apply(java.lang.Object);
Code:
   0: aload_0
   1: aload_1
   2: invokestatic  #29 // Method 
scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
   5: invokevirtual #31 // Method apply:(I)I
   8: invokestatic  #35 // Method 
scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
  11: areturn

  public 
org.apache.spark.sql.Test$$anonfun$5$$anonfun$apply$mcV$sp$1(org.apache.spark.sql.Test$$anonfun$5);
Code:
   0: aload_0
   1: invokespecial #42 // Method 
scala/runtime/AbstractFunction1$mcII$sp."":()V
   4: return
}
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16388: [SPARK-18989][SQL] DESC TABLE should not fail wit...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16388#discussion_r93827087
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -408,8 +408,8 @@ private[hive] class HiveClientImpl(
 lastAccessTime = h.getLastAccessTime.toLong * 1000,
 storage = CatalogStorageFormat(
   locationUri = shim.getDataLocation(h),
-  inputFormat = Option(h.getInputFormatClass).map(_.getName),
-  outputFormat = Option(h.getOutputFormatClass).map(_.getName),
+  inputFormat = Option(h.getTTable.getSd.getInputFormat),
+  outputFormat = Option(h.getTTable.getSd.getOutputFormat),
--- End diff --

When we actually read the hive table, we still use `getInputFormatClass`. 
So this will only affect the `DESC TABLE`, and should be OK?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16397: [WIP][SPARK-18922][TESTS] Fix more path-related t...

2016-12-25 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16397

[WIP][SPARK-18922][TESTS] Fix more path-related test failures on Windows

## What changes were proposed in this pull request?

This PR proposes to fix the test failures due to different format of paths 
on Windows.

Failed tests are as below:

```
ColumnExpressionSuite:
- input_file_name, input_file_block_start, input_file_block_length - 
FileScanRDD *** FAILED *** (187 milliseconds)
  
"file:///C:/projects/spark/target/tmp/spark-0b21b963-6cfa-411c-8d6f-e6a5e1e73bce/part-1-c083a03a-e55e-4b05-9073-451de352d006.snappy.parquet"
 did not contain 
"C:\projects\spark\target\tmp\spark-0b21b963-6cfa-411c-8d6f-e6a5e1e73bce" 
(ColumnExpressionSuite.scala:545)
  
- input_file_name, input_file_block_start, input_file_block_length - 
HadoopRDD *** FAILED *** (172 milliseconds)
  
"file:/C:/projects/spark/target/tmp/spark-5d0afa94-7c2f-463b-9db9-2e8403e2bc5f/part-0-f6530138-9ad3-466d-ab46-0eeb6f85ed0b.txt"
 did not contain 
"C:\projects\spark\target\tmp\spark-5d0afa94-7c2f-463b-9db9-2e8403e2bc5f" 
(ColumnExpressionSuite.scala:569)

- input_file_name, input_file_block_start, input_file_block_length - 
NewHadoopRDD *** FAILED *** (156 milliseconds)
  
"file:/C:/projects/spark/target/tmp/spark-a894c7df-c74d-4d19-82a2-a04744cb3766/part-0-29674e3f-3fcf-4327-9b04-4dab1d46338d.txt"
 did not contain 
"C:\projects\spark\target\tmp\spark-a894c7df-c74d-4d19-82a2-a04744cb3766" 
(ColumnExpressionSuite.scala:598)
```

```
DataStreamReaderWriterSuite:
- source metadataPath *** FAILED *** (62 milliseconds)
  org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: 
Argument(s) are different! Wanted:
streamSourceProvider.createSource(
org.apache.spark.sql.SQLContext@3b04133b,

"C:\projects\spark\target\tmp\streaming.metadata-b05db6ae-c8dc-4ce4-b0d9-1eb8c84876c0/sources/0",
None,
"org.apache.spark.sql.streaming.test",
Map()
);
-> at 
org.apache.spark.sql.streaming.test.DataStreamReaderWriterSuite$$anonfun$12.apply$mcV$sp(DataStreamReaderWriterSuite.scala:374)
Actual invocation has different arguments:
streamSourceProvider.createSource(
org.apache.spark.sql.SQLContext@3b04133b,

"/C:/projects/spark/target/tmp/streaming.metadata-b05db6ae-c8dc-4ce4-b0d9-1eb8c84876c0/sources/0",
None,
"org.apache.spark.sql.streaming.test",
Map()
);
```

```
GlobalTempViewSuite:
- CREATE GLOBAL TEMP VIEW USING *** FAILED *** (110 milliseconds)
  org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/C:projectsspark  arget mpspark-960398ba-a0a1-45f6-a59a-d98533f9f519;
```

```
CreateTableAsSelectSuite:
- CREATE TABLE USING AS SELECT *** FAILED *** (0 milliseconds)
  java.lang.IllegalArgumentException: Can not create a Path from an empty 
string

- create a table, drop it and create another one with the same name *** 
FAILED *** (16 milliseconds)
  java.lang.IllegalArgumentException: Can not create a Path from an empty 
string

- create table using as select - with partitioned by *** FAILED *** (0 
milliseconds)
  java.lang.IllegalArgumentException: Can not create a Path from an empty 
string

- create table using as select - with non-zero buckets *** FAILED *** (0 
milliseconds)
  java.lang.IllegalArgumentException: Can not create a Path from an empty 
string
```

```
HiveMetadataCacheSuite:
- partitioned table is cached when partition pruning is true *** FAILED *** 
(532 milliseconds)
  org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from an empty string);

- partitioned table is cached when partition pruning is false *** FAILED 
*** (297 milliseconds)
  org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from an empty string);
```

```
MultiDatabaseSuite:
- createExternalTable() to non-default database - with USE *** FAILED *** 
(954 milliseconds)
  org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/C:projectsspark  arget mpspark-0839d9a7-5e29-467a-9e3e-3e4cd618ee09;

- createExternalTable() to non-default database - without USE *** FAILED 
*** (500 milliseconds)
  org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/C:projectsspark  arget mpspark-c7e24d73-1d8f-45e8-ab7d-53a83087aec3;

 - invalid database name and table names *** FAILED *** (31 milliseconds)
   "Path does not exist: file:/C:projectsspark  arget 
mpspark-15a2a494-3483-4876-80e5-ec396e704b77;" did not contain "`t:a` is not a 
valid name

[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16397
  
**[Test build #70572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70572/testReport)**
 for PR 16397 at commit 
[`0322689`](https://github.com/apache/spark/commit/03226898cb67e6087bb72faef69d42d3a1a80201).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
Build started: [TESTS] `ALL` 
[![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=54CDC9DA-B59F-48CA-B6A8-23262A166C91&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/54CDC9DA-B59F-48CA-B6A8-23262A166C91)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:54CDC9DA-B59F-48CA-B6A8-23262A166C91

(This will fail because there are other test failures and the build above 
runs all tests.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16398: [SPARK-18842][TESTS] De-duplicate paths in classp...

2016-12-25 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/16398

[SPARK-18842][TESTS] De-duplicate paths in classpaths in processes for 
local-cluster mode in ReplSuite to work around the length limitation on Windows

## What changes were proposed in this pull request?

`ReplSuite`s hang due to the length limitation on Windows with the 
exception as below:

```
Spark context available as 'sc' (master = local-cluster[1,1,1024], app id = 
app-20161223114000-).
Spark session available as 'spark'.
Exception in thread "ExecutorRunner for app-20161223114000-/26995" 
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3332)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
at java.lang.StringBuilder.append(StringBuilder.java:202)
at java.lang.ProcessImpl.createCommandLine(ProcessImpl.java:194)
at java.lang.ProcessImpl.(ProcessImpl.java:340)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
at 
org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:167)
at 
org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73)
```

The reason is, it uses the paths as URLs in the tests whereas some added 
afterward are normal local paths. So, many paths are duplicated because normal 
local paths and URLs are mixed. This length is up to 40K which hits the length 
limitation problem (32K) on Windows.

The full command line built here is - 
https://gist.github.com/HyukjinKwon/46af7946c9a5fd4c6fc70a8a0aba1beb

## How was this patch tested?

Manually via AppVeyor.

**Before**
https://ci.appveyor.com/project/spark-test/spark/build/395-find-path-issues

**After**
https://ci.appveyor.com/project/spark-test/spark/build/398-find-path-issues

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-18842-more

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16398


commit ac872264b59672a077aafc55f23a0705adf23f37
Author: hyukjinkwon 
Date:   2016-12-25T13:12:47Z

De-duplicate paths in classpaths in processes for local-cluster mode in 
ReplSuite to work around the length limitation on Windows




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16398
  
**[Test build #70573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70573/testReport)**
 for PR 16398 at commit 
[`ac87226`](https://github.com/apache/spark/commit/ac872264b59672a077aafc55f23a0705adf23f37).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...

2016-12-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16398
  
Build started: [TESTS] `org.apache.spark.repl.ReplSuite` 
[![PR-16398](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=288FF0F4-F91E-4FE6-89F6-C8C8D9F47A5B&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/288FF0F4-F91E-4FE6-89F6-C8C8D9F47A5B)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:288FF0F4-F91E-4FE6-89F6-C8C8D9F47A5B


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...

2016-12-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16398
  
cc @srowen, this is a similar problem in the last PR in this JIRA, which 
make the test hanging. I think this is the last one. Could I please ask to take 
a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
Note to myself, there should be more but I could not identify them all 
although I believe these are almost all, because some errors are suppressed and 
there are still many test failures including these which make finding more 
harder.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...

2016-12-25 Thread philipphoffmann

Github user philipphoffmann commented on the issue:

https://github.com/apache/spark/pull/14936
  
will do ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16397: [WIP][SPARK-18922][TESTS] Fix more path-related t...

2016-12-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16397#discussion_r93829539
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala
 ---
@@ -221,7 +223,7 @@ class HiveCommandSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleto
   // file://path/to/data/files/employee.dat
   //
   // TODO: need a similar test for non-local mode.
-  if (local) {
+  if (local && !Utils.isWindows) {
--- End diff --

This is being skipped because `incorrectUri` below becomes 
`file://path/to/data/files/employee.dat` (or 
`file://C:/path/to/data/files/employee.dat` on Windows).
This seems checking if the file exists or not with `uri.getPath` assuming 
from 
[tables.scala#L223-L248](https://github.com/apache/spark/blob/5572ccf86b084eb5938fe62fd5d9973ec14d555d/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L223-L248).

In Linux/Mac, it seems `uri.getPath` becomes `/to/data/files/employee.dat` 
where as `/path/to/data/files/employee.dat` on Windows.

The former path does not exists on Linux/Mac but the latter on Windows 
seems fine as the path seems correctly implicitly adding `C:`, meaning on 
Windows, these below seem fine:

```scala
new File("/C:/a/b/c").exists
new File("/a/b/c").exists
```

Therefore, the test below: 

```
intercept[AnalysisException] {
   sql(s"""LOAD DATA LOCAL INPATH "$incorrectUri" INTO TABLE 
non_part_table""")
}
```

dose not throw an exception on Windows because `incoorectURI` seems fine in 
this path. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16397: [WIP][SPARK-18922][TESTS] Fix more path-related t...

2016-12-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16397#discussion_r93829600
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala
 ---
@@ -221,7 +223,7 @@ class HiveCommandSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleto
   // file://path/to/data/files/employee.dat
   //
   // TODO: need a similar test for non-local mode.
-  if (local) {
+  if (local && !Utils.isWindows) {
--- End diff --

Let me fix this to test on Windows too after the test above is being 
finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16397
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16397
  
**[Test build #70572 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70572/testReport)**
 for PR 16397 at commit 
[`0322689`](https://github.com/apache/spark/commit/03226898cb67e6087bb72faef69d42d3a1a80201).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16397
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70572/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15730
  
**[Test build #70574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70574/testReport)**
 for PR 15730 at commit 
[`13ccfff`](https://github.com/apache/spark/commit/13ccfff7b2bf85671511e23e807244d8abd3f9d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16398
  
**[Test build #70573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70573/testReport)**
 for PR 16398 at commit 
[`ac87226`](https://github.com/apache/spark/commit/ac872264b59672a077aafc55f23a0705adf23f37).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16398
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70573/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16398
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...

2016-12-25 Thread zhaorongsheng

Github user zhaorongsheng commented on the issue:

https://github.com/apache/spark/pull/16389
  
@mridulm Please check it. Thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15730
  
**[Test build #70574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70574/testReport)**
 for PR 15730 at commit 
[`13ccfff`](https://github.com/apache/spark/commit/13ccfff7b2bf85671511e23e807244d8abd3f9d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15730
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15730
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70574/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...

2016-12-25 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/16233
  
I'm working on the last option approach, I hope I could finish that in one 
or two more days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16388: [SPARK-18989][SQL] DESC TABLE should not fail wit...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16388#discussion_r93832614
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -408,8 +408,8 @@ private[hive] class HiveClientImpl(
 lastAccessTime = h.getLastAccessTime.toLong * 1000,
 storage = CatalogStorageFormat(
   locationUri = shim.getDataLocation(h),
-  inputFormat = Option(h.getInputFormatClass).map(_.getName),
-  outputFormat = Option(h.getOutputFormatClass).map(_.getName),
+  inputFormat = Option(h.getTTable.getSd.getInputFormat),
+  outputFormat = Option(h.getTTable.getSd.getOutputFormat),
--- End diff --

After more readings, `getTTable.getSd.getInputFormat` and 
`getTTable.getSd.getOutputFormat` will be null for non-native Hive tables, 
e.g., JDBC tables, HBase tables and Cassandra tables. See the link for more 
details: https://cwiki.apache.org/confluence/display/Hive/StorageHandlers

So far, this is OK. I am just afraid we might expand the usage of 
`getTableOption` in the future. Maybe at least document the restrictions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16309: [WIP][SPARK-18896][TESTS] Update to ScalaTest 3.0.1

2016-12-25 Thread jaceklaskowski

Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
For reference: [scala-xml 
releases](https://github.com/scala/scala-xml/releases)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [B...

2016-12-25 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/16399

[SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT-2.1] CTAS for hive 
serde table should work for all hive versions AND Drop Staging Directories and 
Data Files

### What changes were proposed in this pull request?

This PR is to backport https://github.com/apache/spark/pull/15744, 
https://github.com/apache/spark/pull/16104 and 
https://github.com/apache/spark/pull/16134. 

--
[[SPARK-18237][HIVE] hive.exec.stagingdir have no effect
](https://github.com/apache/spark/pull/15744)

hive.exec.stagingdir have no effect in spark2.0.1ï¼
Hive confs in hive-site.xml will be loaded in hadoopConf, so we should use 
hadoopConf in InsertIntoHiveTable instead of SessionState.conf

--
[[SPARK-18675][SQL] CTAS for hive serde table should work for all hive 
versions](https://github.com/apache/spark/pull/16104)


Before hive 1.1, when inserting into a table, hive will create the staging 
directory under a common scratch directory. After the writing is finished, hive 
will simply empty the table directory and move the staging directory to it.

After hive 1.1, hive will create the staging directory under the table 
directory, and when moving staging directory to table directory, hive will 
still empty the table directory, but will exclude the staging directory there.

In `InsertIntoHiveTable`, we simply copy the code from hive 1.2, which 
means we will always create the staging directory under the table directory, no 
matter what the hive version is. This causes problems if the hive version is 
prior to 1.1, because the staging directory will be removed by hive when hive 
is trying to empty the table directory.

This PR copies the code from hive 0.13, so that we have 2 branches to 
create staging directory. If hive version is prior to 1.1, we'll go to the old 
style branch(i.e. create the staging directory under a common scratch 
directory), else, go to the new style branch(i.e. create the staging directory 
under the table directory)

--
[[SPARK-18703] [SQL] Drop Staging Directories and Data Files After each 
Insertion/CTAS of Hive serde Tables](https://github.com/apache/spark/pull/16134)

Below are the files/directories generated for three inserts againsts a Hive 
table:
```

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/._SUCCESS.crc

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/.part-0.crc

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/_SUCCESS

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/part-0

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/._SUCCESS.crc

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/.part-0.crc

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/_SUCCESS

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/part-0

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1

/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-5

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-12-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13758
  
I think that it is good time to close this when #13909 is closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16399
  
**[Test build #70575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70575/consoleFull)**
 for PR 16399 at commit 
[`2482cdc`](https://github.com/apache/spark/commit/2482cdce5680ca5c9754fc759d18e4fefa3d8cd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...

2016-12-25 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16370
  
@zsxwing Is there any farther feedbackï¼


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16320
  
**[Test build #70576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70576/testReport)**
 for PR 16320 at commit 
[`308de12`](https://github.com/apache/spark/commit/308de12950599a6900766a76a0ea39ac72aba59f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15664
  
**[Test build #70577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70577/testReport)**
 for PR 15664 at commit 
[`11f5874`](https://github.com/apache/spark/commit/11f587465c257ba194a157b57244f53ff5eb47fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16399
  
**[Test build #70575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70575/consoleFull)**
 for PR 16399 at commit 
[`2482cdc`](https://github.com/apache/spark/commit/2482cdce5680ca5c9754fc759d18e4fefa3d8cd5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16399
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16399
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16399
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70575/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [B...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16399#discussion_r93838816
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -55,7 +55,10 @@ case class InsertIntoHiveTable(
 
   def output: Seq[Attribute] = Seq.empty
 
-  val stagingDir = sessionState.conf.getConfString("hive.exec.stagingdir", 
".hive-staging")
+  val hadoopConf = sessionState.newHadoopConf()
--- End diff --

https://github.com/apache/spark/pull/15744 needs to be backported too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93839456
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of an underlying array
--- End diff --

data type of underlying array element?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93839460
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of an underlying array
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
--- End diff --

No param doc for `allowNull`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93839535
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of an underlying array
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
+   *   arrayData name, underlying array name)
+   */
+  def genCodeToCreateArrayData(
+  ctx: CodegenContext,
+  elementType: DataType,
+  elementsCode: Seq[ExprCode],
+  allowNull: Boolean): (String, Seq[String], String, String, String) = 
{
+val arrayName = ctx.freshName("array")
+val arrayDataName = ctx.freshName("arrayData")
+val numElements = elementsCode.length
+
+if (!ctx.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[ArrayData].getName
+  val genericArrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", arrayName,
+s"this.$arrayName = new Object[${numElements}];")
+
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayName[$i] = null;"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayName[$i] = ${eval.value};
+ }
+   """
+  }
+
+  /*
+TODO: When we declare arrayDataName as GenericArrayData,
--- End diff --

Is this TODO valid now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93839620
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
--- End diff --

Why return `array`? I don't see you use it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93839767
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of an underlying array
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
+   *   arrayData name, underlying array name)
+   */
+  def genCodeToCreateArrayData(
+  ctx: CodegenContext,
+  elementType: DataType,
+  elementsCode: Seq[ExprCode],
+  allowNull: Boolean): (String, Seq[String], String, String, String) = 
{
+val arrayName = ctx.freshName("array")
+val arrayDataName = ctx.freshName("arrayData")
+val numElements = elementsCode.length
+
+if (!ctx.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[ArrayData].getName
+  val genericArrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", arrayName,
+s"this.$arrayName = new Object[${numElements}];")
+
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayName[$i] = null;"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayName[$i] = ${eval.value};
+ }
+   """
+  }
+
+  /*
+TODO: When we declare arrayDataName as GenericArrayData,
--- End diff --

I think that we could have optimization opportunities if we would update 
Janino. I am planning to submit a PR to Janino.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93839797
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of an underlying array
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
+   *   arrayData name, underlying array name)
+   */
+  def genCodeToCreateArrayData(
+  ctx: CodegenContext,
+  elementType: DataType,
+  elementsCode: Seq[ExprCode],
+  allowNull: Boolean): (String, Seq[String], String, String, String) = 
{
+val arrayName = ctx.freshName("array")
+val arrayDataName = ctx.freshName("arrayData")
+val numElements = elementsCode.length
+
+if (!ctx.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[ArrayData].getName
+  val genericArrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", arrayName,
+s"this.$arrayName = new Object[${numElements}];")
+
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayName[$i] = null;"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayName[$i] = ${eval.value};
+ }
+   """
+  }
+
+  /*
+TODO: When we declare arrayDataName as GenericArrayData,
+   we have to solve the following exception
+  https://github.com/apache/spark/pull/13909/files#r93813725
+  */
+  ("",
+   assignments,
+   s"final $arrayClass $arrayDataName = new 
$genericArrayClass($arrayName);",
+   arrayDataName,
+   arrayName)
+} else {
+  val unsafeArrayClass = classOf[UnsafeArrayData].getName
+  val unsafeArraySizeInBytes =
+UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
+
ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * 
numElements)
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  ctx.addMutableState(unsafeArrayClass, arrayDataName, "");
+
+  val primitiveValueTypeName = ctx.primitiveTypeName(elementType)
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayDataName.setNullAt($i);"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayDataName.set$primitiveValueTypeName($i, ${eval.value});
+ }
+   """
+  }
+
+  (s"""
+byte[] $arrayName = new byte[$unsafeArray

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93839843
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
--- End diff --

Ah, i noticed that `_` can be used for return value. It seems to be better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-25 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13909
  
LGTM except for few minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93841112
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of an underlying array
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
+   *   arrayData name, underlying array name)
+   */
+  def genCodeToCreateArrayData(
+  ctx: CodegenContext,
+  elementType: DataType,
+  elementsCode: Seq[ExprCode],
+  allowNull: Boolean): (String, Seq[String], String, String, String) = 
{
+val arrayName = ctx.freshName("array")
+val arrayDataName = ctx.freshName("arrayData")
+val numElements = elementsCode.length
+
+if (!ctx.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[ArrayData].getName
+  val genericArrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", arrayName,
+s"this.$arrayName = new Object[${numElements}];")
+
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayName[$i] = null;"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayName[$i] = ${eval.value};
+ }
+   """
+  }
+
+  /*
+TODO: When we declare arrayDataName as GenericArrayData,
+   we have to solve the following exception
+  https://github.com/apache/spark/pull/13909/files#r93813725
+  */
+  ("",
+   assignments,
+   s"final $arrayClass $arrayDataName = new 
$genericArrayClass($arrayName);",
+   arrayDataName,
+   arrayName)
+} else {
+  val unsafeArrayClass = classOf[UnsafeArrayData].getName
+  val unsafeArraySizeInBytes =
+UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
+
ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * 
numElements)
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  ctx.addMutableState(unsafeArrayClass, arrayDataName, "");
+
+  val primitiveValueTypeName = ctx.primitiveTypeName(elementType)
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayDataName.setNullAt($i);"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayDataName.set$primitiveValueTypeName($i, ${eval.value});
+ }
+   """
+  }
+
+  (s"""
+byte[] $arrayName = new byte[$unsafeArrayS

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93841229
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of an underlying array
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
+   *   arrayData name, underlying array name)
+   */
+  def genCodeToCreateArrayData(
+  ctx: CodegenContext,
+  elementType: DataType,
+  elementsCode: Seq[ExprCode],
+  allowNull: Boolean): (String, Seq[String], String, String, String) = 
{
+val arrayName = ctx.freshName("array")
+val arrayDataName = ctx.freshName("arrayData")
+val numElements = elementsCode.length
+
+if (!ctx.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[ArrayData].getName
+  val genericArrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", arrayName,
+s"this.$arrayName = new Object[${numElements}];")
+
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayName[$i] = null;"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayName[$i] = ${eval.value};
+ }
+   """
+  }
+
+  /*
+TODO: When we declare arrayDataName as GenericArrayData,
+   we have to solve the following exception
+  https://github.com/apache/spark/pull/13909/files#r93813725
+  */
+  ("",
+   assignments,
+   s"final $arrayClass $arrayDataName = new 
$genericArrayClass($arrayName);",
+   arrayDataName,
+   arrayName)
+} else {
+  val unsafeArrayClass = classOf[UnsafeArrayData].getName
+  val unsafeArraySizeInBytes =
+UnsafeArrayData.calculateHeaderPortionInBytes(numElements) +
+
ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * 
numElements)
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  ctx.addMutableState(unsafeArrayClass, arrayDataName, "");
+
+  val primitiveValueTypeName = ctx.primitiveTypeName(elementType)
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayDataName.setNullAt($i);"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayDataName.set$primitiveValueTypeName($i, ${eval.value});
+ }
+   """
+  }
+
+  (s"""
+byte[] $arrayName = new byte[$unsafeArray

[GitHub] spark pull request #16255: [SPARK-18609][SQL]Fix when CTE with Join between ...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16255#discussion_r93841751
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -200,6 +200,8 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] 
{
 case plan: Project if plan eq proj => plan.child
--- End diff --

what do you mean?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #70578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70578/testReport)**
 for PR 13909 at commit 
[`293b344`](https://github.com/apache/spark/commit/293b344e761bc4b9c04891c02c702a374472345a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16320
  
**[Test build #70576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70576/testReport)**
 for PR 16320 at commit 
[`308de12`](https://github.com/apache/spark/commit/308de12950599a6900766a76a0ea39ac72aba59f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70576/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16320
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15664
  
**[Test build #70577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70577/testReport)**
 for PR 15664 at commit 
[`11f5874`](https://github.com/apache/spark/commit/11f587465c257ba194a157b57244f53ff5eb47fd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15664
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70577/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15664
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-12-25 Thread debasish83

Github user debasish83 commented on the issue:

https://github.com/apache/spark/pull/12574
  
test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-12-25 Thread debasish83

Github user debasish83 commented on the issue:

https://github.com/apache/spark/pull/12574
  
Can we close it ? Looks like SPARK-18235 opened up recommendForAll 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-12-25 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14452
  
@davies It is true that pushing down different predicates results in 
different CTE logical/physical plans. I spend some LOC changes in this to 
address that cases, i.e., preparing a disjunctive predicate for duplicated CTE 
with different predicates.

For Q64, a disjunctive predicate will be pushed down too. I am not sure 
what the problem is you mentioned.

Let me try to get and show the pushed down predicate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...

2016-12-25 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/16400

[SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP TABLE` 
with `LOCATION`

## What changes were proposed in this pull request?

This PR adds a new behavior change description on `CREATE TABLE ... 
LOCATION` at `sql-programming-guide.md` clearly under `Upgrading From Spark SQL 
1.6 to 2.0`. This change is introduced at Apache Spark 2.0.0 as 
[SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276).

## How was this patch tested?

```
SKIP_API=1 jekyll build
```

**Newly Added Description**
https://cloud.githubusercontent.com/assets/9700541/21475905/d55c3e1e-cae6-11e6-8651-9bf2be53b6dd.png";>


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-18941

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16400.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16400


commit 3ea7860a3c030ba40ffda40d6e5c586ecce078c3
Author: Dongjoon Hyun 
Date:   2016-12-26T05:06:12Z

[SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP TABLE` 
with `LOCATION`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16400: [SPARK-18941][SQL][DOC] Add a new behavior document on `...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16400
  
**[Test build #70579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70579/testReport)**
 for PR 16400 at commit 
[`3ea7860`](https://github.com/apache/spark/commit/3ea7860a3c030ba40ffda40d6e5c586ecce078c3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14473: [SPARK-16495] [MLlib]Add ADMM optimizer in mllib package

2016-12-25 Thread debasish83

Github user debasish83 commented on the issue:

https://github.com/apache/spark/pull/14473
  
ADMM is already available as a breeze solver (BFGS, OWLQN, 
NonlinearMinimizer) which is integrated with ml/mllib...It will be great if you 
can look into it and let me know if you need pointers in running comparisons 
with OWLQN:

https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
This is implemented based on the paper you cited.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-25 Thread alokob

Github user alokob commented on the issue:

https://github.com/apache/spark/pull/16355
  
@imatiach-msft Did you find the dataset suitable. Is anything else needed 
from my side?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16400: [SPARK-18941][SQL][DOC] Add a new behavior document on `...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16400
  
**[Test build #70579 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70579/testReport)**
 for PR 16400 at commit 
[`3ea7860`](https://github.com/apache/spark/commit/3ea7860a3c030ba40ffda40d6e5c586ecce078c3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16400: [SPARK-18941][SQL][DOC] Add a new behavior document on `...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16400
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16400: [SPARK-18941][SQL][DOC] Add a new behavior document on `...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16400
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70579/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #9759: [SPARK-11753][SQL][test-hadoop2.2] Make allowNonNumericNu...

2016-12-25 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/9759
  
@srowen @rxin @zsxwing The Option json serialization issue 
(https://github.com/FasterXML/jackson-module-scala/issues/240) looks like fixed 
now. Do you think it is ok I try to upgrade Jackson now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93844552
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, array) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of an underlying array
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
+   *   arrayData name, underlying array name)
+   */
+  def genCodeToCreateArrayData(
+  ctx: CodegenContext,
+  elementType: DataType,
+  elementsCode: Seq[ExprCode],
+  allowNull: Boolean): (String, Seq[String], String, String, String) = 
{
+val arrayName = ctx.freshName("array")
+val arrayDataName = ctx.freshName("arrayData")
+val numElements = elementsCode.length
+
+if (!ctx.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[ArrayData].getName
+  val genericArrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", arrayName,
+s"this.$arrayName = new Object[${numElements}];")
+
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayName[$i] = null;"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayName[$i] = ${eval.value};
+ }
+   """
+  }
+
+  /*
+TODO: When we declare arrayDataName as GenericArrayData,
--- End diff --

it's a very minor problem, let's not bother about it and remove this todo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93844573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,108 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, _) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of underlying array elements
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @param allowNull if to assign null value to an array element is 
allowed
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
+   *   arrayData name, underlying array name)
+   */
+  def genCodeToCreateArrayData(
+  ctx: CodegenContext,
+  elementType: DataType,
+  elementsCode: Seq[ExprCode],
+  allowNull: Boolean): (String, Seq[String], String, String, String) = 
{
+val arrayName = ctx.freshName("array")
+val arrayDataName = ctx.freshName("arrayData")
+val numElements = elementsCode.length
+
+if (!ctx.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[ArrayData].getName
--- End diff --

this is not needed, `ArrayData` is imported by default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93844582
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +57,108 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
-
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
-} else {
-  $values[$i] = ${eval.value};
-}
-   """
-}) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val (preprocess, assigns, postprocess, arrayData, _) =
+  GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true)
+ev.copy(
+  code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + 
postprocess,
+  value = arrayData,
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  /**
+   * Return Java code pieces based on DataType and isPrimitive to allocate 
ArrayData class
+   *
+   * @param ctx a [[CodegenContext]]
+   * @param elementType data type of underlying array elements
+   * @param elementsCode a set of [[ExprCode]] for each element of an 
underlying array
+   * @param allowNull if to assign null value to an array element is 
allowed
+   * @return (code pre-assignments, assignments to each array elements, 
code post-assignments,
+   *   arrayData name, underlying array name)
+   */
+  def genCodeToCreateArrayData(
+  ctx: CodegenContext,
+  elementType: DataType,
+  elementsCode: Seq[ExprCode],
+  allowNull: Boolean): (String, Seq[String], String, String, String) = 
{
+val arrayName = ctx.freshName("array")
+val arrayDataName = ctx.freshName("arrayData")
+val numElements = elementsCode.length
+
+if (!ctx.isPrimitiveType(elementType)) {
+  val arrayClass = classOf[ArrayData].getName
+  val genericArrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", arrayName,
+s"this.$arrayName = new Object[${numElements}];")
+
+  val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
+val isNullAssignment = if (allowNull) {
+  s"$arrayName[$i] = null;"
+} else {
+  "throw new RuntimeException(\"Cannot use null as map key!\");"
+}
+eval.code + s"""
+ if (${eval.isNull}) {
+   $isNullAssignment
+ } else {
+   $arrayName[$i] = ${eval.value};
+ }
+   """
+  }
+
+  /*
+TODO: When we declare arrayDataName as GenericArrayData,
+   we have to solve the following exception
+  https://github.com/apache/spark/pull/13909/files#r93813725
+  */
+  ("",
+   assignments,
+   s"final $arrayClass $arrayDataName = new 
$genericArrayClass($arrayName);",
+   arrayDataName,
+   arrayName)
+} else {
+  val unsafeArrayClass = classOf[UnsafeArrayData].getName
--- End diff --

this is not needed, `UnsafeArrayData` is imported by default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93844718
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -133,49 +209,26 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
 val mapClass = classOf[ArrayBasedMapData].getName
-val keyArray = ctx.freshName("keyArray")
-val valueArray = ctx.freshName("valueArray")
-ctx.addMutableState("Object[]", keyArray, s"this.$keyArray = null;")
-ctx.addMutableState("Object[]", valueArray, s"this.$valueArray = 
null;")
-
-val keyData = s"new $arrayClass($keyArray)"
-val valueData = s"new $arrayClass($valueArray)"
-ev.copy(code = s"""
-  $keyArray = new Object[${keys.size}];
-  $valueArray = new Object[${values.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-keys.zipWithIndex.map { case (key, i) =>
-  val eval = key.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  throw new RuntimeException("Cannot use null as map key!");
-} else {
-  $keyArray[$i] = ${eval.value};
-}
-  """
-}) +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-values.zipWithIndex.map { case (value, i) =>
-  val eval = value.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  $valueArray[$i] = null;
-} else {
-  $valueArray[$i] = ${eval.value};
-}
-  """
-}) +
+val MapType(keyDt, valueDt, _) = dataType
+val evalKeys = keys.map(e => e.genCode(ctx))
+val evalValues = values.map(e => e.genCode(ctx))
+val (preprocessKeyData, assignKeys, postprocessKeyData, keyArrayData, 
keyArray) =
+  GenArrayData.genCodeToCreateArrayData(ctx, keyDt, evalKeys, false)
+val (preprocessValueData, assignValues, postprocessValueData, 
valueArrayData, valueArray) =
--- End diff --

are `keyArray` and `valueArray` used? I think we don't need to return the 
array name in `genCodeToCreateArrayData`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-25 Thread wzhfy

GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/16401

[SPARK-18998] [SQL] Add a cbo conf to switch between default statistics and 
estimated statistics

## What changes were proposed in this pull request?

We add a cbo configuration to switch between default stats and estimated 
stats.
We also define a new statistics method `planStats` in LogicalPlan with conf 
as its parameter, in order to pass the cbo switch and other estimation related 
configurations in the future. `planStats` is used on the caller sides (i.e. 
Optimizer and Strategies) to make transformation decisions based on stats.

## How was this patch tested?

Add a test case using a dummy LogicalPlan.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark cboSwitch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16401


commit 53c1b26e9fc7c253b1654145f910e7881db34de7
Author: Zhenhua Wang 
Date:   2016-12-24T15:43:53Z

add cbo switch




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16401
  
**[Test build #70580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70580/testReport)**
 for PR 16401 at commit 
[`53c1b26`](https://github.com/apache/spark/commit/53c1b26e9fc7c253b1654145f910e7881db34de7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r93844970
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -112,7 +112,25 @@ object JdbcUtils extends Logging {
*/
   def insertStatement(conn: Connection, table: String, rddSchema: 
StructType, dialect: JdbcDialect)
   : PreparedStatement = {
-val columns = rddSchema.fields.map(x => 
dialect.quoteIdentifier(x.name)).mkString(",")
+// Use database column names instead of RDD schema column names
+val tableSchemaQuery = 
conn.prepareStatement(dialect.getSchemaQuery(table))
--- End diff --

We can get the table schema [when we checking whether the table 
exists](https://github.com/apache/spark/blob/fb07bbe575aabe68422fd3a31865101fb7fa1722/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala#L63).
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16391: [SPARK-18990][SQL] make DatasetBenchmark fairer f...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16391#discussion_r93844973
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -170,36 +176,39 @@ object DatasetBenchmark {
 val benchmark3 = aggregate(spark, numRows)
 
 /*
-OpenJDK 64-Bit Server VM 1.8.0_91-b14 on Linux 
3.10.0-327.18.2.el7.x86_64
-Intel Xeon E3-12xx v2 (Ivy Bridge)
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.12.1
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+
 back-to-back map:Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
 

-RDD   3448 / 3646 29.0 
 34.5   1.0X
-DataFrame 2647 / 3116 37.8 
 26.5   1.3X
-Dataset   4781 / 5155 20.9 
 47.8   0.7X
+RDD   3963 / 3976 25.2 
 39.6   1.0X
+DataFrame  826 /  834121.1 
  8.3   4.8X
+Dataset   5178 / 5198 19.3 
 51.8   0.8X
--- End diff --

ah, scala compiler is smart! I think we can create a ticket to optimize 
this, i.e. call the primitive apply version, and update the benchmark result.

For byte code analysis, let's discuss about it in the ticket later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...

2016-12-25 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16401
  
cc @rxin @cloud-fan @viirya


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16388: [SPARK-18989][SQL] DESC TABLE should not fail with forma...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16388
  
**[Test build #70581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70581/testReport)**
 for PR 16388 at commit 
[`1f277f4`](https://github.com/apache/spark/commit/1f277f4fcc1b19bb94e2a9debd1fe7f9786e7de4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-17931][CORE] taskScheduler has some unneeded seri...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #70582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70582/testReport)**
 for PR 15505 at commit 
[`be912cb`](https://github.com/apache/spark/commit/be912cb2650364fcd12c45ad5a63a23f1a158779).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16383: [SPARK-18980][SQL] implement Aggregator with TypedImpera...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16383
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16383: [SPARK-18980][SQL] implement Aggregator with TypedImpera...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16383
  
**[Test build #70583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70583/testReport)**
 for PR 16383 at commit 
[`0a73fe2`](https://github.com/apache/spark/commit/0a73fe208ed7e211daf75dd2268aec91868c7ee3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r93846292
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -642,6 +642,13 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val CBO_ENABLED =
+SQLConfigBuilder("spark.sql.cbo.enabled")
+  .internal()
+  .doc("Enables CBO for estimation of plan statistics when set true.")
+  .booleanConf
+  .createWithDefault(false)
--- End diff --

shall we enable it by default? cc @hvanhovell @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16399
  
thanks, merging to 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #70578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70578/testReport)**
 for PR 13909 at commit 
[`293b344`](https://github.com/apache/spark/commit/293b344e761bc4b9c04891c02c702a374472345a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15819
  
https://github.com/apache/spark/pull/16399 has been merged, feel free if 
you wanna backport to 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70578/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r93846787
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -112,7 +112,25 @@ object JdbcUtils extends Logging {
*/
   def insertStatement(conn: Connection, table: String, rddSchema: 
StructType, dialect: JdbcDialect)
   : PreparedStatement = {
-val columns = rddSchema.fields.map(x => 
dialect.quoteIdentifier(x.name)).mkString(",")
+// Use database column names instead of RDD schema column names
+val tableSchemaQuery = 
conn.prepareStatement(dialect.getSchemaQuery(table))
+var columns: String = ""
+try {
+  val tableSchema = getSchema(tableSchemaQuery.executeQuery(), dialect)
+  val nameMap = tableSchema.fields.map(f => f.name -> f.name).toMap
+  val lowercaseNameMap = tableSchema.fields.map(f => 
f.name.toLowerCase -> f.name).toMap
+  columns = rddSchema.fields.map { x =>
+if (nameMap.isDefinedAt(x.name)) {
+  dialect.quoteIdentifier(x.name)
+} else if (lowercaseNameMap.isDefinedAt(x.name.toLowerCase)) {
+  dialect.quoteIdentifier(lowercaseNameMap(x.name.toLowerCase))
+} else {
+  throw new SQLException(s"""Column "${x.name}" not found""")
+}
--- End diff --

The name resolution should be still controlled by 
`spark.sql.caseSensitive`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r93847101
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -112,7 +112,25 @@ object JdbcUtils extends Logging {
*/
   def insertStatement(conn: Connection, table: String, rddSchema: 
StructType, dialect: JdbcDialect)
   : PreparedStatement = {
-val columns = rddSchema.fields.map(x => 
dialect.quoteIdentifier(x.name)).mkString(",")
+// Use database column names instead of RDD schema column names
+val tableSchemaQuery = 
conn.prepareStatement(dialect.getSchemaQuery(table))
+var columns: String = ""
+try {
+  val tableSchema = getSchema(tableSchemaQuery.executeQuery(), dialect)
+  val nameMap = tableSchema.fields.map(f => f.name -> f.name).toMap
+  val lowercaseNameMap = tableSchema.fields.map(f => 
f.name.toLowerCase -> f.name).toMap
+  columns = rddSchema.fields.map { x =>
+if (nameMap.isDefinedAt(x.name)) {
+  dialect.quoteIdentifier(x.name)
+} else if (lowercaseNameMap.isDefinedAt(x.name.toLowerCase)) {
+  dialect.quoteIdentifier(lowercaseNameMap(x.name.toLowerCase))
+} else {
+  throw new SQLException(s"""Column "${x.name}" not found""")
+}
+  }.mkString(",")
+} finally {
+  tableSchemaQuery.close()
+}
 val placeholders = rddSchema.fields.map(_ => "?").mkString(",")
 val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)"
 conn.prepareStatement(sql)
--- End diff --

Can we build the INSERT SQL statement in `saveTable` based on the schema? 
No need to prepare the generated statement in `saveTable`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #70584 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70584/testReport)**
 for PR 13909 at commit 
[`69d5e33`](https://github.com/apache/spark/commit/69d5e33d2035fc5f6f4dfec65bde60c7dfc39548).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-25 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r93847467
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -133,49 +209,26 @@ case class CreateMap(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
 val mapClass = classOf[ArrayBasedMapData].getName
-val keyArray = ctx.freshName("keyArray")
-val valueArray = ctx.freshName("valueArray")
-ctx.addMutableState("Object[]", keyArray, s"this.$keyArray = null;")
-ctx.addMutableState("Object[]", valueArray, s"this.$valueArray = 
null;")
-
-val keyData = s"new $arrayClass($keyArray)"
-val valueData = s"new $arrayClass($valueArray)"
-ev.copy(code = s"""
-  $keyArray = new Object[${keys.size}];
-  $valueArray = new Object[${values.size}];""" +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-keys.zipWithIndex.map { case (key, i) =>
-  val eval = key.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  throw new RuntimeException("Cannot use null as map key!");
-} else {
-  $keyArray[$i] = ${eval.value};
-}
-  """
-}) +
-  ctx.splitExpressions(
-ctx.INPUT_ROW,
-values.zipWithIndex.map { case (value, i) =>
-  val eval = value.genCode(ctx)
-  s"""
-${eval.code}
-if (${eval.isNull}) {
-  $valueArray[$i] = null;
-} else {
-  $valueArray[$i] = ${eval.value};
-}
-  """
-}) +
+val MapType(keyDt, valueDt, _) = dataType
+val evalKeys = keys.map(e => e.genCode(ctx))
+val evalValues = values.map(e => e.genCode(ctx))
+val (preprocessKeyData, assignKeys, postprocessKeyData, keyArrayData, 
keyArray) =
+  GenArrayData.genCodeToCreateArrayData(ctx, keyDt, evalKeys, false)
+val (preprocessValueData, assignValues, postprocessValueData, 
valueArrayData, valueArray) =
--- End diff --

Oh, good catch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16402: [SPARK-18999][SQL][minor] simplify Literal codege...

2016-12-25 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/16402

[SPARK-18999][SQL][minor] simplify Literal codegen

## What changes were proposed in this pull request?

`Literal` can use `CodegenContex.addReferenceObj` to implement codegen, 
instead of `CodegenFallback`.  This can also simplify the generated code a 
little bit, before we will generate: `((Expression) references[1]).eval(null)`, 
now it's just `references[1]`.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16402.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16402


commit 60108571773d9d196cd491512a5cbcd01d878afa
Author: Wenchen Fan 
Date:   2016-12-26T07:22:32Z

simplify Literal codegen




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16402: [SPARK-18999][SQL][minor] simplify Literal codegen

2016-12-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16402
  
cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16402: [SPARK-18999][SQL][minor] simplify Literal codegen

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16402
  
**[Test build #70585 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70585/testReport)**
 for PR 16402 at commit 
[`6010857`](https://github.com/apache/spark/commit/60108571773d9d196cd491512a5cbcd01d878afa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16345: [SPARK-17755][Core]Use workerRef to send RegisterWorkerR...

2016-12-25 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16345
  
Thanks. Merging to master. This seems risky to backport to other branches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #70584 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70584/testReport)**
 for PR 13909 at commit 
[`69d5e33`](https://github.com/apache/spark/commit/69d5e33d2035fc5f6f4dfec65bde60c7dfc39548).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 103 matches

Mail list logo