[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-26 Thread mn-mikke
Github user mn-mikke commented on the issue:

https://github.com/apache/spark/pull/20858
  
retest please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20858
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88596/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20858
  
**[Test build #88596 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88596/testReport)**
 for PR 20858 at commit 
[`bb46c3d`](https://github.com/apache/spark/commit/bb46c3d3d3e18a9e05ddb6fe6efda3c25c2711a4).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedConcat(children: Seq[Expression]) extends 
Expression`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20858
  
**[Test build #88596 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88596/testReport)**
 for PR 20858 at commit 
[`bb46c3d`](https://github.com/apache/spark/commit/bb46c3d3d3e18a9e05ddb6fe6efda3c25c2711a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20858
  
@mn-mikke Could you update the PR title?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-26 Thread mn-mikke
Github user mn-mikke commented on the issue:

https://github.com/apache/spark/pull/20858
  
Merged concat and concat_arrays functions into one via an unresolved 
expression and subsequent resolution. Do you have any objections to this 
approach?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-25 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
Also, `postgresql` has the function `array_cat` for concatenating arrays, 
so it might be better to make the behaviour the same with the `postgresql` one:
https://www.postgresql.org/docs/10/static/functions-array.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-25 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
We should handle different (and compatible) typed arrays in this funs?
```
scala> sql("select concat_arrays(array(1L, 2L), array(3, 4))").show
org.apache.spark.sql.AnalysisException: cannot resolve 
'concat_arrays(array(1L, 2L), array(3, 4))' due to data type mismatch: input to 
function concat_arrays sh
ould all be the same type, but it's [array, array]; line 1 pos 
7;
'Project [unresolvedalias(concat_arrays(array(1, 2), array(3, 4)), None)]
+- OneRowRelation
```
Also, could you add more tests for this case in `SQLQueryTestSuite`? 
probably, we can add a new test file like `concat_arrays.sql` in 
`typeCoercion.native`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-25 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
The current code can't handle inner arrays;
```
scala> sql("select concat_arrays(array(1, 2, array(3, 4)), array(5, 6, 7, 
8))").show
org.apache.spark.sql.AnalysisException: cannot resolve 'array(1, 2, 
array(3, 4))' due to data type mismatch: input to function array should all be 
the same type, but it's [int, int, array]; line 1 pos 21;
'Project [unresolvedalias('concat_arrays(array(1, 2, array(3, 4)), array(5, 
6, 7, 8)), None)]
+- OneRowRelation
```

IMHO, it's better to make this function behaviour the same with postgresql: 
https://www.postgresql.org/docs/10/static/functions-array.html
Could you brush up code to handle this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
ok, I'll check later!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20858
  
@maropu Maybe you can help @mn-mikke  review this PR ?  Will open an 
umbrella JIRA for the built-in functions we plan to do in Apache 2.4. In the 
list, we have multiple for operating nested data.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-20 Thread mn-mikke
Github user mn-mikke commented on the issue:

https://github.com/apache/spark/pull/20858
  
@maropu What other libraries do you mean? I'm not aware of any library 
providing this functionality on top Spark SQL.

When using Spark SQL as an ETL tool for structured and nested data, people 
are forced to use UDFs for transforming arrays since current api for array 
columns is lacking. This approach brings several drawbacks:
- bad code readability
- Catalyst is blind when performing optimizations
- impossibility to track data lineage of the transformation (a key aspect 
for the financial industry, see [Spline](https://absaoss.github.io/spline/) and 
[Spline 
paper](https://github.com/AbsaOSS/spline/releases/download/release%2F0.2.7/Spline_paper_IEEE_2018.pdf))

So my colleagues and I decided to extend the current Spark SQL API with 
well-known collection functions like concat, flatten, zipWithIndex, etc. We 
don't want to keep this functionality just in our fork of Spark, but would like 
to share it with others.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20858
  
Thinks for this work! One question; why do you think we need to support 
this api in Spark native? Other libraries support this as first-class?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20858
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org