[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18652
  
**[Test build #79819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79819/testReport)**
 for PR 18652 at commit 
[`73f9827`](https://github.com/apache/spark/commit/73f982763a5d2991c51903aaad19c9ce6d06a49d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18652
  
**[Test build #79820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79820/testReport)**
 for PR 18652 at commit 
[`951124f`](https://github.com/apache/spark/commit/951124fccbb3a0d15066c4c8db1c6805af686384).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18652
  
**[Test build #79821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79821/testReport)**
 for PR 18652 at commit 
[`b80d2fc`](https://github.com/apache/spark/commit/b80d2fca6362d63a5eecf43a40b30040d4ac392b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18652
  
**[Test build #79819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79819/testReport)**
 for PR 18652 at commit 
[`73f9827`](https://github.com/apache/spark/commit/73f982763a5d2991c51903aaad19c9ce6d06a49d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79819/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18652
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18652
  
**[Test build #79820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79820/testReport)**
 for PR 18652 at commit 
[`951124f`](https://github.com/apache/spark/commit/951124fccbb3a0d15066c4c8db1c6805af686384).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18652
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79820/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18652
  
**[Test build #79821 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79821/testReport)**
 for PR 18652 at commit 
[`b80d2fc`](https://github.com/apache/spark/commit/b80d2fca6362d63a5eecf43a40b30040d4ac392b).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18652
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79821/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18652
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18652
  
**[Test build #79827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79827/testReport)**
 for PR 18652 at commit 
[`b80d2fc`](https://github.com/apache/spark/commit/b80d2fca6362d63a5eecf43a40b30040d4ac392b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18652
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18652
  
**[Test build #79827 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79827/testReport)**
 for PR 18652 at commit 
[`b80d2fc`](https://github.com/apache/spark/commit/b80d2fca6362d63a5eecf43a40b30040d4ac392b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18652
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79827/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-21 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18652
  
@viirya I have not start reading the comments and the codes carefully. Just 
want to confirm whether the code changes in this PR follow what Hive is doing 
when we turn on the flag? If not, what is the behavior difference? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18652
  
@gatorsmile When the flag is enabled, we don't follow Hive on 
non-deterministic join conditions.

The difference are:

* Hive allows non-deterministic expressions in equi join keys and other 
join conditions. We only allow the kind of expressions as equi join keys.
* From the inspection on Hive, non-deterministic expressions when used as 
join conditions, are not treated in different way than deterministic ones. We 
pull non-deterministic equi join keys from Join operators.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18652
  
Then, will this PR resolve the migration issue from Hive workloads?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-26 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18652
  
It is a good question. Based on previous discussion, I think Join operator 
has no unique result in the non-deterministic case. The migration issue from 
Hive is because this kind of queries can't run by current SparkSQL. Whether we 
exactly follow Hive behavior is not real pain, if I didn't miss something.

Take the non-deterministic join in the query seen in the discussion thread 
at 
http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-Syntax-quot-case-when-quot-doesn-t-be-supported-in-JOIN-tc21953.html#a21973
 as an example:

SELECT a.col1
FROM tbl1 a
LEFT OUTER JOIN tbl2 b
ON
CASE
WHEN a.col2 IS NULL
THEN cast(rand(9)*1000 - 99 as string)
ELSE
a.col2 END
= b.col3;

Where the different join result (the randomized value replacing NULL values 
of a.col2) actually doesn't matter, because it only retain a.col1 finally. The 
purpose of this query is to mitigate skew (huge NULLs) data when joining.

That said the columns from non-deterministic join conditions are not really 
useful at this cases. I even think it may be also the reason Hive doesn't put 
special consideration on non-deterministic expression when joining. As there's 
no unique result, users go to take care what they do with it.

I am not sure if I convey the idea clearly.

As an analogy, for a query like `SELECT a.col1 FROM table a WHERE rand() > 
0.5`. Even our rand generates different numbers than Hive so the final result 
is different. It still helps migration issue from Hive workloads using rand.


 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org